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A Surrey  sf  the  Theory  of  Selective  Inf 
and  Satv$  of  its  Beh*via?ai  Appiicutii 


Part  I.  The  Discrete!  Theory 


1.  Introduction 


There  is  a widespread  belief  - most  forcefully  articulated  by 
Norbert  Wiener  [99]  - that  we  are  undergoing  a new  scientific  resolution, 
one  comparable  in  scape  and  scientific  significance  to  that  of  the  last 
century}  but  where  the  dominant  concepts  in  the  previous  development  were 
enersv.  power,  and  efficiency,  the  central  notions  here  are  information, 
communication,  and  feedback.  Many  current  problems  stem  from  attempts  to 
transmit  information  and  to  exercise  control  effectively  rather  than  to  achieve 
an  efficient  use  of  energy;  little  more  than  chaos  would  result,  for  example, 
were  the  design  of  a high-speed  computer  approached  from  the  energy  standpoint. 
"Information  is  information,  net  matter  or  energy.  Ho  materialism  which  does 
not  admit  this  can  survive  at  the  present  day."  [p.  15$,  99]. 

What  then  is  information?  Hew  is  it  measured?  What  scientific 
statements  can  be  made  using  the  tern? 

Several  schools  of  thought  have  developed,  each  giving  its  own 
answers  to  these  questions.  In  this  report  vs  propose  to  examine  the  answers 


Is  Mott  often  the  title  'information  theory'  is  used  without  ths  prefix 
'eeXfutivs'i  however,  some  feel  that  the  simpler  title  is  misleading,  especially 
ohiviC  there  sxists  n theory  of  «t.ruet.m**J  Information  and  one  of  semantic  information* 


of  one  of  these  schools  and  to  indicate  soma  ecus  equates  for  problems  of 
psychology.  But  before  we  turn  to  this,  a certain  amount  of  background  material 
on  the  history,  orientation,  and  relation  of  information  theory  to  ether  theories 
is  appropriate. 

It  is  clear  that  if  Wiener  end  others  are  correct  in  their  views, 
the  intuitive  concept  'information5  must  be  given  at  least  one  precise  meaning 
and  maybe  more.  Considering  the  variety  and  vagueness  of  its  meanings  in 
everyday  usage,  it  is  an  a priori  certainty  that  objections  will  be  raised 
against  any  particular  formulation,  which  will  surely  ignore  some  of  these 
meanings,  This  problem  - if  it  be  such  - has  been  met  many  times  in  science j 
we  need  only  tnink  of  words  and  concepts  like  force,  energy,  work,  etc.  It 
is  doubtful  that  a formal  definition  ever  stands  or  falls  because  of  such 
debates}  it  is  rather  the  power  and  depth  of  the  resulting  theory  which 
determines  its  ultimate  fate. 

Within  the  lest  two  decades  two  distinct  attempts  have  been  made 
to  deal  with  the  notion  of  information,  ons  in  Europe,  and  one  in  Americas 
these  have  been  complementary  rather  than  competitive,  both  theories  seem 
to  have  arisen  from  much  the  cams  class  of  applied  problems  t communication 
involving  electrical  signals.  The  European  school,  in  which  the  names  of 
Gabor  [21,  22.  23,  2U»  25,  26]  and  MacKay  [5U,  55,  56]  are  the  most  important, 
has  beun  concerned  with  tha  problem  of  the  information  contained  in  a representa- 
tion of  a physical  situation.  As  seems  intuitively  reasonable,  the  concepts 
of  sis©  and  dimensionality  are  important  here.  In  America,  largely  as  the 
result  of  work  by  Wiener  [99,  100]  and  Shannon  [87,  88,  89,  92,  93,  9UJ  a 
theory  of  information  transmission  has  developed  whop»  dominant  concepts  are 


! 


! 

i 


! 

1 


n 


-3- 


those  of  selection,  statistical  possibilities,  and  noise 0 

In  this  report  we  shall  not  go  into  a detailed  study  of  the  notions 
of  structural  and  metrical  information  (the  European  school),  for  this  theory 
has  had,  so  far  as  we  have  determined,  almost  no  effect  on  behavioral  applica- 
tions . Of  interest  to  the  behaviorist,  however,  is  the  apparently  overlooked 
fact  that  the  basic  concept  of  structural  information  theory  is  identical  to 
the  central  assumption  of  factor  analysis.  Both  theories  are  concerned  with 
the  number  of  independent  dimensions  which  are  required  to  represent  a certain 
class  of  data,  and  the  r«1  mod'-l  of  any  particular  situation  is  -3  a 

point  in  Euclidean  n-apace.  If  ve  arc  correct  in  tills  observation,  it  is 
interesting  that  basically  the  same  concept  has  been  independently  arrived 
at  by  both  the  physicists  and  the  psychologists,  aac!  it  may  be  unfortunate 
that  each  is  unaware  of  the  wo?*k  of  the  other. 

There  are,  of  course,  jaai-ked  differences  of  emphasis  which  reflect 
the  diverse  origins  and  problems.  For  exsmplo,  the  European  information 
theorists  have,  in  the  theory  of  metrical  information,  examined  in  same  detail 
the  basic  natural  units  in  which  the  several  dimensions  can  be  scaled.  Their 
examples  are  entirely  drawn  from  plysi.cs  and  so  it  is  not  immediately  obvious 
whether  any  of  the  scaling  work  in  the  behavioral  sciences  is  an  independent 
development  of  metrical  information  notions  or  whether  they  are  totally  differ- 
ent. On  the  other  hand,  the  factor  analyst;  have  developed  an  elaborate 
matrix  machinery  suited  to  the  determination  of  the  approximate  dimensionality 
of  the  Euclidean  representation  of  certain  types  of  data.  A comparable  machinery 
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does  not  appear  to  exist-  in  structural  information  theory,  though,  of  course, 
tlie  close  relation  of  the  structural  model  to  matrix  theory  is  apparent  B 

Our  concern,  however,  is  with  selective  information  theory.  The 
central  observation  of  this  theory  io  that  for  a great  many  purposes  - in  parti- 
cular in  the  design  of  coranunieaiion  equipment  - one  is  never  concerned  with 
the  particular  message  that  is  sent  but  rather  with  the  oliss  of  all  messages 
which  might  be  sent  and  the  probability  of  the  occurrence  of  each.  "Wo  are 
scarcely  ever  interested  in  the  performance  of  a eoramunicataan-nnginscriiig 
machine  for  a single  input.  To  function  adequately  it  must  give  a satisfactory 
performanne  for  a whole  class  of  inputs,  and  this  means  a statistically  satis- 
factory perform  e for  the  class  of  inputs  which  it  is  statistically  expected 
to  receive."  [p.  55,  99] « From  this  point  of  view.  Information  is  transmitted 
by  a selection  from  among  certain  alternatives,  and  iho  contention  is  that  a 
selection  of  an  a priori  rare  event  conveys  more  information  to  the  receiver 
than  doe3  one  which  is  expected.  This  use  of  'information1  obviously  ignores 
all  questions  of  meaning.  "It  is  important  to  emphasize,  at  the  start,  that 
we  are  net  concerned  with  the  meaning  or  the  truth  of  massages;  semantics 
lies  outside  the  scope  of  mathematical  information  theory.  Ip.  383,  7]« 


1.  Carnap  and  Ear-Hillel  [6]  hare  presented  a theory  of  semantic  information 
which  is  based  on  Carnap’s  work  on  inductive  logic.  Since  their  approach  is 
different  from  that  of  selective  information  theory,  cml  since,  as  far  as  we 
know,  there  have  been  no  behavioral  applications  of  it,  we  have  elected  not-  to 
summarize  it  here.  It  may,  however,  become  important,  and  should  therefore  not 
be  neglected  hy  the  serious  student  of  ■fills  area. 
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It  may  bs  useful  to  introduce  at  this  point  three  cosmoon-sense 


observations  which  will  be  given  a precise  meaning  in  the  presentation  of  the 
theory  of  selective  information  - precise  tc  the  point  where  ambers  can  be 
attached  to  them. 

1.  A person  communicating  over  a noisy  telephone  line  can  get  less 
across  in  a given  period  of  time  than  he  can  over  a perfectly  clear  line. 

2.  Mot  every  letter-  nor  indeed  every  word,  of  a message  in  any 

natural  language  is  as  important  ss  every  other  one  in  getting  the  sense  of 
the  message.  For  example,  the  missing  letter  in  '<j_iet»  or  the  missing  word 
in  'many  happy  of  the  day'  can  be  filled  in,  with  a high  probability 

of  being  correct*  by  anyone  knowing  English,  and  therefore  in  the  above  context 
they  do  not  carry  much  important  information. 

3.  Every  person  aeems  to  have  a limited  capacity  to  assimilate 
information,  and  ii*  it  is  presented  to  him  too  rapidly  and  without  adequate 
repetition,  true  capacity  will  be  exceeded  and  cassunicaticas  will  break  down. 

As  they  stand,  it  is  not  imadir.tely  obvious  that  at  least  acme  of 
these  statements  are  not  concerned  with  semantics,  or,  for  that  matter,  that 
the  whole  problem  of  information  transmission  is  not  primarily  semantic.  One 
major  contribution  of  information  theory  is  in  showing  that  much  of  what  is 
implied  or  suggested  in  these  examples  and  others  like  them  can  be  given  a 
precise  and  useful  meaning  by  a statistical  treatment. 

We  shall  delve  into  tills  mors  deeply  in  the  following  sections; 
but  first,  1st  us  discuss  briefly  some  of  the  origins  «-,f  the  theory  and  of 
the  developing  interest  of  behavioral  scientists  in  it.  Electrical  commuuica- 

1.  A much  sieve  complete  history  of  both  the  American  and  European  schools  has  been 
given  by  Chexv.y  [7,  bj« 


_a 


c 


at 

i 


•i 

! 


( 


tlon  engineers  gradually  had  been  gaining  experience  in  the  handling  and  trans- 
mission of  information  since  the  early  days  of  the  telegraph,  telephone,  and 
radio,  and  during  the  1920's  this  experience  began  to  be  formalized  as  a theory. 

A Ei03t  important  early  paper  was  that  of  Hartley  [33]  in  1928,  where  the  log- 
arithmic measure  so  characteristic  of  modern  inforoation  theory  was  employed 
in  a simple  form.  The  maturation  of  the  theory,  however,  resulted  from  the 
«urk  of  two  sse,  Herbert  Wiener  of  !!0I.TS  end  his  former  student  C.E.  Shannon 
of  tie  Sell  Telephone  Laboratories  . Shannon's  pnnnw of  1918  [87.  88]  are  now 
the  classic  formulation  of  the  theory,  though  tho  acre  mthesatically  inclined 
reader  will  find  McMillan's  recent  presentation  of  the  central  theorems  more 
satisfactory  (63 3 c 

Both  Wiener  and  Shannon  had  much  larger  interests  than  improved 
electrical  communication,  and  they  sensed  the  wider  implication*  of  the  theory 
and  of  several  related  concepts  - feedback  being  one  of  the  most  important. 

In  a aeries  of  conferences  ar.d  seminars  dating  back  to  191{1  and  continuing  to 
the  present,  these  concepts  - sometimes  classed  under  the  title  of  'Cybernetics,' 
a word  coined  by  Wiener  for  this  somewhat  nebulous  discipline  - and  their  applica- 
tions to  the  various  behavioral  sciences  have  been  examined  and  debated.  These 
meetings'*'  have  been  held  largely  in  the  East,  many  of  thorn  in  Cambridge,  and, 
ax  & consequence,  the  impact  of  information  theory,  which  has  been  so  strong 
along  the  Eastern  seaboard,  has  been  less  marked  in  the  West. 

Many  of  the  emnirical  sciences  dealing  with  human  behavior  - psycho- 
logy, linguistics,  physiology,  biology,  psychophysics,  social  psychology,  neuro- 


1.  In  the  introduction  to  his  book  Cybernetics  [99] , Wiener  presents  a detailed 
history  of  the  early  meetings.  **“"  ’"”*'* 
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legy,  medicine,  anthropology  - havu  had  I’epresentatives  at  sees  of  these 


seminars)  indeed,  these  men  have  organised  and  dominated  many  of  the  meet- 


ings. Proa  this  there  has  emerged  a small  group  of  analytically  Inclined 


behavioral  scientists  who  believe  that  information  theory  is,  or  can  be,  a 


useful  tool  in  handling  sons  problems  in  various  of  the  disciplines*  We  shall 


try  to  indicate  some  of  the  uses,  and  the  usefulness,  of  the  theory  in  the 


latter  half  of  this  report. 


Our  organisation  of  the  material  is  Into  two  parts.  In  the  first. 


we  shall  try  to  present  a motivated  synopsis  of  the  discrete  theory  of  selective 


iiif uxTifcVtioa # The  presentation  is  moat  deeply  Influenced  by  Shannon  ■ a , although 


there  has  been  some  departure  from  his.  In  the  second  part  we  shall  be  concerned 


entirely  with  applications  of  the  theory  to  problems  in  psychology.  An  attempt 


has  been  made  to  group  the  papers  discussed  according  to  the  conventional 


categories  used  in  psychology.  A short  summary  of  Shannon’s  theory  of  con- 


tinuous communication  systems  appears  in  an  append!'-  ^ While  this  theory  is 


of  great  importance  in  electrical  application,  it  has  sc  far  been  of  minor 


significance  in  behavioral  applications,  and  30  it  was  felt  that  it  should 


be  separated  from  the  min  body  of  the  report. 


?.  General  Concepts 
2.1  Communication  Systems 


Information  transmission  always  occurs  within  a certain  physical 


framework  which  in  general  may  be  called  a communication  system.  Basically 


such  a system  consists  of  three  central  parts*  a source  of  messages,  a channel 


'-4rAl  yVlr*-?V%’y> 


over  which  the  aes sages  flow,  and  a destination  for  the  messages.  The  source* 
which  very  often  is  a human  being,  generates  messages  (and  so  information*  see 
section  1.2.3)  by  making  * series  of  decisions  among  certain  alternatives.  It 
is  the  sequence  of  such  decisions  that  ve  call  a message  in  a discrete  system. 
These  messages  are  then  sent  ever  the  channel,  which  is  nothing  more  than  an 
appropriate  medium  whiew  f^u&biishes  a connection  having  Certain  physical 
cluiraeteristico  between  the  source  and  the  destination-  Tfechanically,  this 
picture  in  incomplete,  since  the  decisions  made  by  the  source  must  be  put 
into  a form  which  is  suitable  for  transmission  ever  the  channel,  and  the 
signals  coming  from  the  channel  must  be  transformed  at  the  destination  into 
stimuli  acceptable  to  it.  Thus,  between  the  source  and  the  channel  we  intro- 
duce a transmitter  which  serves  to  "match*  the  channel  to  the  source,  and 
between  the  channel  and  the  destination  we  introduce  a receiver  which  "matches* 
the  channel  to  the  destination.  In  other  vorcLs,  the  transmitter  encodes  the 
message  for  the  channel  and  the  receiver  decodes  it.  A schematic  diagram  of 
the  system  is  shown  in  Fig.  10 


} Source  ( **/  Transmitter 


Fig.  1 

It  is  entirely  possible  to  have  transmitters  which  so  encode 
messages  tnat  it  is  not  possible  to  design  a receiver  which  iidll  corroietelx 
recover  the  original  massage.  For  example,  if  one  ha?  a receiver  which  en- 
codes all  affirmative  statements  such  as  "O.K.,*  "yes,*  "all  right,"  etc.  into 
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the  eaise  signal,  then  no  device  can  be  tiilt  which  will  translate  that  signal 
back  into  the  particular  word  chosen  by  the  source.  A transmitter  in  which 
this  is  the  case  is  called  singular , otherwise  it  is  called  non-singular. 

(These  terns  arise  if  one  thinks  of  the  transmitter  as  a g^ny-nany  trensfersa- 
tioa  or-  as  a one-to-one  transformation.)  When  the  transmitter  is  ncn=singular 
it  is  possible  to  design  a receiver  which  i»  capable  of  couplete  recovery  of 
the  original  message  s in  other  words  - there  exists  a receiver  which  Is  the 
inverse  of  the  transmitter.  Throughout  cur  discussion  we  shall  assume  that  the 
transmitter  is  non-singular  and  that  the  receiver  is  its  inverse-  In  affect , 
this  inn  arte  that  ee  can  ignore  them  in  our  discussion  and  suppose  that  the 
source  and  destination  are  both  matched  to  the  charnel. 

Our  abstract  eoammiestion  system  seems  f airly  c onplete  except 
that  it  does  not  allow  for  the  posifcility  that  more  than  one  source  nay  be 
using  the  same  channel  at  the  same  time*  Certainly  this  can  happen*  It 
occurs  when,  by  mistake,  one  telephone  line  is  carrying  two  conversations 
at  once  (cross-talk)*  It  also  happens  in  telephone  or  radio  communication 
when  there  Is  static  in  addition  to  the  desired  message « In  all  such  cases 
the  messages  from  sources  other  than  the  one  under  consideration  - which  we 
will  simply  call  the  source  - cause  interference  with  messages  from  tiio  oOui  c£'  » 
Such  interference  rsy  be  minor  and  may  fee  of  no  effect  on  the  Intelligibility 
of  the  message,  as  for  < example  in  the  usual  low-level  telephone  static,  or  it 
may  be  most  destructive, as  when  another-  conversation  is  cut  in*  Another  example 
which  one  might  te*»d  to  put  into  the  same  category  of  interferonces  is,  say, 
the  60-cycle  hum  which  is  cannon  to  so  many  cheap  r«oi.Oij  a id  which  is  eliminated 
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in  better  crmnuniettlc^  o/oUaas  only  by  careful  design,  a lot  of  hard  work, 
and  a certain  amount  of  lue?:*  If  the  hum  level  Is  high  enough  it  certainly 
can  lower  the  intelligibility  of  speech.  However,  there  is  an  important 
difference  between  the  problem  of  interference  from  hum  and  that  due  to  static 
or  other  conversations.  The  forme*  is  completely  predictable,  given  a short 
sample  to  determine  thr  exact  frequency,  the  phase,  and  the  amplitude,  and 
so  if  it  exists  in  the  channel  one  can  either  build  into  the  transmitter  or 
into  the  receiver  a network  to  subtract  it  from  the  resulting  signal,  leaving 
only  the  nessuge.  Static,  hiss,  and  crosstalk  cannot  be  predicted  in  any 
detail  from  any  amoi;ot  of  past  evidence  about  them;  therefore,  once  they 
enter  the  channel,  it  37  cannot  be  characterized  in  full  arid  then  subtracted 
from  the  signal,  but  they  must  rather  be  accepted  and  compensated  for  in 
other-  ways. 

Thus  in  our  abstraction  we  must  conceive  a second  source  (which 
may  in  fact  be  several  limped  together)  also  feeding  signals  into  the  channel, 
which  has  obe  property  that  (for  the  problem  under  consideration)  neither  the 
source  nor  the  di»s  .1  nation  nan  predict  in  detail  the  messages  which  will 
emanate  from  it,  T;e  source  or  the  destination  msy  'nave  or  may  obtain  etati- 
stical  data  about  -he  nature  of  this  second  source,  for  example,  in  an  electrical 
communication  sy  tem  the  average  power  of  the  second  signal  may  be  measured. 

Such  a source  Is  known  a a a noise  source  and  the  signal  it  generates  will  be 
sailed  noise.  Clearly,  those  are  often  relative  terms  ani  what  in  one  context 
is  noise  may  be  tio  me s sago  in  another.  This,  then,  complete*  our  model  of  a 
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Whan  there  is  a noise  source  in  a ays tea  it  in  conventional  to 
speak  of  the  channel  as  being  noisy,  but  it  is  veil  to  keep  in  mind,  that  this 
is  but  an  abbreviated,  and  slightly  misleading,  way  of  speaking*  The  noise 
irignei  is  aot  an  invariant  of  the  channel,  as  are  its  physical  chasasterlsties* 
7x  la  clear  that  one  can  change  tbs  amount  of  noise  in  a system  while  keeping 
the  physical  characteristics  of  the  channel,  the  source,  and  the  destination 
the  same.  In  any  given  problem  unde-’  consideration,  the  noise  level  will 
presumably  remain  constant  and  so  it  can  be  thought  of  as  a property  of  the 
channel,  but  as  we  sliv.ll  see  it  is  s property  which  must  be  ha^-lled  very 
differently  is  the  theory  from  the  physical  characteristics  of  the  channel. 


202  Noiseless  Systems 


Not  ecam^snication  system  is  ever  noiseless  in  the  sense  that  there 


Is  no  noise  signal*  For  example.  in  any  electrical  system  there  must  alweys  be 
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signals  present  which  result  from  the  random  agitation  of  molecules  - thermal 
noise*  This  can  be  a serious  problem  in  a high-gain  amplifier,  but  it  is  not 
in  a telephone*  The  point,  of  course,  is  that  noise  is  not  In  and  of  itself 
bad,  but  only  when  it  causes  a significant  interference  in  the  messages  sent 

i 

by  the  source*  The  only  pertinent  feature  of  noise  is  whether  it  causes  the 

f. 

destination  to  ’think’  a different  message  was  sent  from  the  one  actually  seati 
Thsio  if  the  noise  level  is  low  coupared  with  the  signal  level,  eo  low  that  it 
does  not  significantly  alter  the  message  as  it  passes  along  the  channel,  then 
it  may  be  complete iy  disregarded  and  the  system  can  be  treated,  as  if  there 

were  no  noise  present* 

I 

Since  we  have  assumed  by  definition  that  the  effect  of  noise  is 
unpredictable  in  advance  except  statistically,  all  we  shall  bo  able  to  state 
eh  out  the  effect  of  noise  on  message?  *•  and  all  «e  need  to  state  - is  the 
probability  that  it  changes  one  signal  into  another*  If  the  signals  sent 
(in  a given  situation)  are  always  received  correctly*  than  we  say  the  system 
(os*  the  channel)  is  noiseless*  It  must  always  be  kept  in  mind  that  if  we 
change  the  level  at  vrhich  the  transmitter  operates,  or  the  level  of  the 
rrrfse  signal.,  we  may  change  the  system  from  a noiseless  one  to  a noisy  one<> 

Being  noiseless  is  a property  of  the  whole  system  and  not  of  the  channel 

i 

) 

alone! 

! 

In  principle,  it  is  not  necessary  to  deal  separately  with  the  theory 
of  the  noiseless  and  noisy  cases,  for  the  former  is  but  a special  case  of  the 

! 

latter.  The  presentation*  however,  in  simpler  if  we  bring  in  the  cctpli cations 
^ one  at  a time,  so  we  shall  examine  the  noiseless  case  first  (section  1.3)  and 

then  the  noisy  one  (section  I*U)? 

i 
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2.3  Bit  * a ant  at  lafornattcn 

To  carry  act  ttm  prcgrsn  wntlawd  in  t be  latretaetion,  ’wneljr, 
to  Mfc»  precise  -i&l  aesssra&Le  sow  of  '*»?  inttsiticra  acjegrriiag  the  toss**— 
sissies  of  lnfcnaetian,  It-  is  sscesssz?  to  introdacn  » unit  in  terns  cf  which 
sonants  of  lsfomtioa  suy  be  masoned.  The  central  observation  which  1 m 
needed  before  cate  css  essivs  a*  tn  appropriate  unit  is  that  a nesssgs  cogreys 
l^oTWivtoa  only  by  its  relatics  to  ail  the  other  messages  wMch  adrift  bars 
been  retailed.  Straws®  a persea  in  asksd  testhsr  hs  sssSas^  If  » hsve  *»** 
parlor  iitf Croatian  other  then  pc.oai=tiraa  stst; i-U.cn  cm  §sa3rf,ag,  torn  all  vs 
tear  is  the  pxcfccbllit?  that  he,  ms  a randan  selection  from  the  popalstian, 
will  answer  ’yea,*  and  tee®  he  selects  one  of  these  alternatives  and  trenesite 
it,  asm  inforsstiaa  has  been  conveyed*  Bat  if  it  is  known  a priori.  s.g», 
froa  prerions  ccararrxk&aaa  or  free:  seeing:  Me  smote  that  ha  does  seaofes,  than 
with  prcfedsOt^y  osas  the  answer  will  be  ’res1  and  tbs  receipt  of  ’yes*  free 
Ms  will  set  con-ray  ary  (aor)  infermtiesu  Is  effect,  war  prior  tenwladge 
reduced  the  set  of  possible  messages  to  one  with  bet  one  element,  and  so  fhr 
as  wb  are  concerned  there  was  so  choice  to  be  nade,  and  thus  ao  iBfs3-3jBtd.aa 
could  be  traasaitted. 

The  a condition,  therefore,  trader  which  information  can  be 
iransnlttei  is  when  there  is  a choice  between  too  alternatives*  The  "a*  i mot 
-aasertaiMy  in  suefc  a choics  between  two  alternatives  ffldhto  when  t hay  are 
equally  probable,  hence  t^e  agadLcne  Infarcatiao  is  conveyed  from  a choice 
batons®  two  alternatives  teen  they  are  equally  litely,  lie  bate  snen  a choice 
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to  be  cor  ad*  of  lafenstiau  Ibti  la,  wbsrwrtrr  * choice  is  nade  betKie 
tsra  * priori  equally  likely  alcern&ilree  (do  natter  what  they  are)  we  »ui 
ssgr  that  one  tacit  of  iafwtfam  tes  heea  transmitted  by  tbu  choice,  Accord- 
?ry  to  ShasESSE,  Ss&ey  proposed  that  the  unit  be  called  a bit  - a shortened 
fares  of  binary  digit  - «ad  that  tewe  is  ccassooly  used.  Ool'cwm  [2?]  prefer* 
the  ten*  %i2sit*  in  «*<**?■  to  srclu  each  eapressims  as  ‘a  bit  of  iafamtlmt 
mfeicli,  onfortuisately*  'oas  quite  mother  grerydqy  naaning,  bat  we  shall  eoofon 
tc  causes;  usage.  Alt  of  emr  gtaiwwle  shoot  infamaatici  tawatwlwi,  there- 
fore, irtlt  be  gisrsa  la  fcMr  salt;  we  shall  cpeeic  of  m 3=s<y  'bits  1a  a aeer- 
sags,*  or  the*h its  transmitted  per  ssccsd,*  or  the  'bits  per  Siglish  letter,* 
etc* 

A second  iritultiTtly  desirable  feature  Is  ■easnriag  inrann&iion  1* 
that  It  should  be  additlre*  Ve  shall  for*  Use  exactly  whet  this  nua  later 
(section  T-3.ii),  but  for  the  present  it  Is  =»cugh  ta  aaj  tint  if  two  independent 
choice*  are  befeeema  two  « gautwri  eqaally  likely  wltermtiTes ; then  a 
total  of  two  bite  Is  transarittad. 

Is  an  wrayl*  of  bow  the  bit  my  ya  used,  ccnai&vr  a t«t  of  element# 
(think  sf  than  as  letter*  of  an  alphabet)  k which  saefc  ala^st  is  •^qsslly 
likely  to  be  selected*  Tcrtber,  suppose  that  tbs  susbar  a a?  elements  is  of 
the  fa*  jF  vbere  U is  e integer*  Ossetia’ * when  cs  eSeoetit  is  eha®ea  from 
this  sst,  bow  seasy  bite  of  i— ci  sitlca  are  < uOTOyed.  i-e-  j for  this  set,  2s a* 
nary  bite  per  elecest  sre  there?  Tbe  «obm^c  is  I«  «•  can  easily  shw  that 
there  are  ao  aare  then  2T  bits,  for  suppose  vs  divide  the  set  into  half. 


half  being  cmpcsed  of  2®"^  eissante.  The  eiaamrs  being  choeen  is  in  ocs 
half  ecr  the  other  , and  the  dsdrloo  as  to  wince  half  it  is  in  is  a dadalas 
between  two  eqwiiy  likely  al terns. tiros  (since  each  aleaait  has  the  saae 
probability  of  being  chosen'  and  so  it  conveys  ana  bit  of  isfarsatioo*.  Ms 
tb«t  set  and  divids  it  in  half,  each  half  new  consisting  of  2**"^  eleaoexiis. 
again,  u»  uecisiaa  as  to  which  of  the  two  eats  contains  tbs  desired  dsasat 
is  between  two  equally  likely  alternatives?  and  so  another  bit  sf  information 
is  tewvwitfced  in  isolating  it.  Continuing  the  process  until,  the  elaaent  is 
isolated  clearly  require*  S steps?  and, assaying  additivity?  S bits  of  infersa- 
tion  an  tstnsssitted.  Tbs  fact  that  all  the  euat*aiis  wr-e  assansd  to  be 
equally  UJcelr  should  suggest,  that  «o  sc bens  can  be  devised  to  isolate  tbs 
elgseat  in  feoar  than  3 binary  decisions;  this  can  be  proved  to  be  the  case, 
tie  shell  not  prove  it,  for  the  conclusion  th*t  there  are  2 bits  per  ayzfeol 
in  this  situation  vffi.  fallow  frea  :tach  gt-rongar  sad  denser  results  which 
we  shall  present  later. 

The  English  alphabet  ceasisis  at  26  letters  which  with  a specs* 
coraaa,  period,  ssarfcolaa,  colon,  and  question.  nark  totals  to  32  **  2" 
elessHts,  Were  we  to  suppose  th?rr.  to  be  ebocse  iratep^aaotly  aui  with  oqcsl 
frcWH"  <fcies  (which  is  patently  false)  then  K—ah  latter  •*■?  a sesoigt  would 
yield  five  bits  of  inforsaticr,  ’Jhile  this  is  cleerly  not  a correct  astirete 
of  the  bits  per  letter  in  atglish  cross,  it  does  stead  as  an  apper  bound  to 
this  cabsr.  Inter  (section  11.2)  we  shall  discuss  nare  precise  Astlante* 
which,  s Sow  that  is  Is  actually  somewhere  between  X and  2 bits  per  latter. 
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Ccrrtinaing  vita  the  essspha,  ofcs*rv*»  thet  vher?  n • Tj  thee 
? • Iw^n  by  iefiniticn  of  iho  Isgsrltls,  an  so  ve  atjr  say  that  ia  this 
sitaatiaa  there  are  logjp.  cite  of  information  per  slanest.  We  will  find 
that  cur  «ofcs6<y*»**  diacjssiou  of  ixfcrnalioo  transmission  results  in  le®» 
arithnie  aeasurej  slightly  wa re  ©enpliected  than  this* 


3»  The  Discrete  Doiaeieaa  System 

Ts>  £Ms  sect laa  ve  grail  di-trsrss  what  is  hnam  as  the  discrete 
noiseless  cosaEaxieatlon  system.  The  definition  of  a noiseless  system  has 
been  gtren  in  section  1.2.2,  end  it  »*r  be  ssamertaed  by  saying  that  in  each 
a system  there  ia  neear  any  confusion  at  the  designation  as  to  aMcfc  sijgial 
(of  a Isacsm.  class  of  sdgssljt)  was  esLtrsed  by  the  transmitter.  Oils,  of  coarse, 
dssc  ret  sesr  tfast  she  signal  rretdred  is  nccess-Krity  identical  to  tfas  signal 
sent,  bst  only  that  sd  ccnfstion  can  arise  ss  to  ^bat  signal  was  seat* 

The  ward  ’diserets*  refers  to  the  nature  of  the  Jnfamation  source, 
and  It  describee  a source  which  generates  messages  by  temporally  ordered 
rmenxmr*".**  of  seLectiems  frees  a finite  set  of  psseSsls  choices*  Hass,  the 
discrete  case  isdudas  a east  asaraai  of  familiar  anwniertion,  snob,  as  the 
selertioas  tank  f.m  an  elybahst  to  generate  words  and  sentences.  Eat  the 
theory  of  this  section  does  act  isoiads  sources  which  cen  select  from  all 
sositiznflHB  bfiTindsd  functions  cc  the  internal  0 to  Ij  we  bars  outlined  that 
theory  in  ths  appendix. 


3.1  (Sasnel  Cipudty 


In  sny  coacsmicatiop  systea  the  transmitter  IS  si  oo  tv 
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natch  t he  g-cmrce  ie  the  channel.  Signals  assorting  fro*  the  traaanri.tter, 
widci  are  assured  to  be  in  cna~tc-oae  correspondence  with  the  selsc^iojs 
cade  t~  the  source,  sr-»  •'ircp&gated  alcn^  the  channel.  As  far  «b  this  conwurrfca— 
tion  process  is  concerned.  the  relevant  effect  of  the  physical  characfcearist^ea 
cf  tbs  chaaniL  is  to  dstc  rdr^  bear  nsny  different  sfffwlr  esc  be  traassitted 
ever  it  to  a given  space  cf  tine,  3ougJi3j'>  this  is  Hh3t  'ae  naan  by  the 
capacity  of  the  channel.  Formally,  1st  Sf(7)  denote  tbs  nusfcer  of  different 

°»T  i a^g  -Who  rh?aft  BTOOertifiSS 

i * each  signal  can  be  edited  by  the  traossivter  &s  5 result  of 


selections  by  the  scores, 

ii*  «a»*  signal  is  a-fealssibls  on  tbr  cftsncaj,  i.e.,  each  signal  is 
eceorfcible  with  the  physical  characteristics  of  the  catoasl. 


and  ill*  eacb  signal  is  of  duration  T tins  tadts* 

5!rass  the  discussion  cf  section;  1,2.3*  it  is  suggested  (ibwtgb  b? 
no  aeons  proved}  that  if  each  of  Usees  K(t)  sfgssls  sere  -Wu^lly  likely  thaa 
there  voul'i  bo  log-Mvl)  bits  cf  asresmtiou  kx  riga?.  cf  aewidee  3 tLae 


C{T>  * 


ioggHt?) 
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bits  per  si  goal  per  unit  Vina,  Sow,  since  it  i«  plas-sible  to  surpass  that 
■BodT?r-^  infcrsatlcn  is  taransEittea  when.  each  signs!  is  equally  likely,  and 
since  v&  have  tohec  5(7}  tc  be  the  largest  susfcer  of  different  signals  which 
23?  be  trsosdt'ie-i  over  the  chancel  in  ? tins  ’units,  it  is  tnersfere  rssscsibls 
to  g impose  that  C{  T}  is  cpprostiitabsly  ths  asxhas  mafcer  of  bits  cf  iEfe$s=t-±cjs 
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twittiMd  crer  t be  channel  in 


unit*  Sirens  thaw 


<wb  b*  only  ess  signal  oc  lie  charnel  at  a tine,  C(T)  is  afprariaatolr  tbs 
"grfg».  iwifceg  of  bits  which  can  be  bandied  by  Ihs  channel  in  salt  tW. 
The  “jygeadaatton  Mill  tend  to  be  tetter  the  larger  we  take  T,  ton  are 
led  to  define  tbs  capacity  " of  ibe  charnral  to  be: 


C » 1 in  C(T)  - iim  logj^t) 
7-*«o  I-Vio  ... 
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Ftar  any  practical  application  of  this  concept  the  trie*  is  to 
detemlse  E(T)  fren  the  pi^yrical  characteristics  of  the  channel  or  from  aiy 
thearezea  we  nay  dartre  which  involTe  C.  In  the  next  section  we  3hw.ll  disease 
any  iepertant  sx®«pl»  of  the  first-  procedure,  and  later,  in  Beetles  1-3.7, 
we  shell  present  a theorem  which  has  been  usd  to  approx*  rate  C empirically* 

2*2  £ SpeeiaTl  Case  of  Channel  Capacity 

For  tbs  anat  mo  ghat*!  restrict  csselTs:  t-o  ■»  special  class  of 
frr«^^<tterw»harmp.i  nakin^f.ion  which,  pcssibiy,  is  best  iLfccstrsted  Sy  the 
fairillarr  ease  of  the  dot-daaL  teiegrapiy  code,  lie  ssppccc  that  at  any  instant 
there  either  is  cr  is  not  a signal  cm  the  wire  connecting  the  transmitter  to 
the  receiver.  A dot  will  be  represented  by  one  time  unit  of  signal  and  oae 
fci.*e  unit  of  no  signal,  sad  a dash  by  three  units  of  sigozsl  followed  by  one 
unit  of  no  signal*  Between  letters  we  allow  three  units  of  no  signal  and 
between  serds  sire  nsrdts  of  so  signal:  Problem?  ccopwte  the  channel  edacity. 
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Let  as  define  tire  different  states  for  this  system  which  we  shall 
call  a^  and  a£.  The  system  is  in  state  a^  following  either  a letter  or  a 
word  space,  and  It-  Is  in  state  a.  following  either  a dot  or  a dash.  Since  a 
word  or  letter  space  can  nerer  follow  either  a word  or  letter  space,  we  kno- 


that  the  asst  signsl  after  the  pyrtsm  is  in  state  must  be  a dot  or  a dash, 
and  that  the  next  state  must  therefore  be  a^j  however,  when  the  system  is 
in  state  it  can  be  followed  by  any  of  the  four  possibilities  and  so  by 
either  state  2^  or  s^=  This  is  iHtiafcrated  schematically  in  Fig  3. 

We  are  now  in  a position 
to  generalise  this  in  a natural  man- 
ner to  a system  hewing  m possible  states 
a^,  . .a^  and  n possible  signals 

S1,  When  the  system  is  in 

state  tKavj  only  ' certain  subset  cf 
the  sigmas  may  arise;  let  S_  denote  a 

u 


Fig.  3 

admissible  S together  determine  what  the  next  state  will  be.  Let  us  denoto 
8 (a) 

it  s^.  i* or  all  such  possible  triples  (i,s,  j ) let  b^j  denote  the  time  dura- 

th 

tion  or  the  s symbol*  Obviously  certain  of  the  combinations  cannot  arise, 
e.g.,  in  the  telegraphy  case  the  triple  (a,,  word  spate,  &2)  is  not  admissible 
(see  Fig-  3).  On  the  ether  hand,  (s^,  dajh,  a2)  is  admissible  and  its  b value 
is  four  time  units. 
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The  channel  capacity  of  this  system  can  be  shown  [ 07  j to  oe  given 
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vissre  V is  the  largest  reel  root  of  the  dstezalzsmtal  equation 
© 
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1 if  i = 


*3  0 if  i/j  . 


In  tt»  teSjB&apttF  case,  the  graph  or  rig*  3 can  he  pat  in  the 


following  matrix  fern- 


jPrssoat 

State 


! *i 

a ♦ 


3L  l — 

1 ! 
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a^  | let  ter  or 


Jieort  State 


3TS"iCe 


dot  or  dasb 


dot  or  dssh 


Free  this  ire  see  tost 


©treaties  reads 


= 0 = ~ 


jtf*3**  V®  - ^ - V2  « l| 
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j = 0 = jtf10-  V6  - ^ - ¥3  - ¥ 

t 

* 

3-_»2.rins  far  sad  cqggtlBg  log^?e  ve  find  that-  C - 0.539  tits  par  unit  tine. 

5$ hS*S  will  U ^?id  ebeut  channel  canecity  before  wc  art  *JSW!  Sti^y  hot 
first  it  is  swsessai^  to  discuss  the  source  and  to  davalec  a suitable  neasars' 


Im**-"'*  M«i*#lMS|(i*WiiiS Wtr-rti 


for  the  enrage  infra  tat  tier,  generated  by  any  discrete  source* 


3*3  Be  Diserete  Sock* 


Is  vs  have  said,  ss  asiae  that  there  is  a source  wMeh  takes  selec- 


tions (with  replacement}  fre??  a finite  set  of  elements  and  that  netaages  are 
gsasrstsd  *5  iawnniv  free*  frir.  set.  it»  aitaatioc  we 

have  in  mdtxi  is  analogous  to  the  «7  vs  f am  tagHite  sentences  by  urusnd 
selections  of  letters,  blanks,  aai  paaefcotticsn  Karts* 

A mast-s  reflection  about  Stgliaa  trill  suggest  two  lapcrtmt 


SXSCLSUCSl  1BCUI  aoau. 


! £ 

1 * 
I * 


1*  there  is  no  lessor  to  snpposo  that  the  prchability  that  ass 
srsbol  will  be  selected  is  the  same  as  that  for  another  $p4solr  the  letter 
*s*  is  Bach  lees  ^9«a»t3y  used  in  teglish  than  is 


Si.  is  general,  the  choice  of  one  ajaiyoJ.  in  tee  middle  of  « message 
sill  not  be  independent  of  the  prwmwHng  choiots*  vMl?  *e*  *>■»  * Mdt  a wriiori 
probability  of  being  chosen,  the  probability  is  marie edly  reduced  if  the  letters 
imt.-»rAri 1 faces  already  bests  recriTSi  and  it  is  mrudi^  increased  if  ths 
letters  *«rt®K»»ii‘  hsr«  been  received* 

iJhils  nest  faunsn  sources  produce  an  interdependence  bstseca  syssdl 
selections  - often  called  izstsrspbal  influences  - there  are  sosse  ease®  of 

such  as  the  irsnsaissiac  of  random  naribers  or  of  ss  U2r**m»»cted 
set  ef  telee&sss  sesbsrw.  In  the  nezt  section  we  shall  analyse  tea  eass  of 
-j  ssl*ctitsns  end  in  statics.  1,3.6  tee  hwtr  cmpllcsted  ease  where 

tee!:*  are  depeaaeaelse 

To  deal  with  these  problems  of  syx»oI»  o»  looted  with  different- 


cganriBa-i  <*  - * 


t 

o 
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freqaaoedee  cd  of  the  jntgddpwdwo  af  ^rnbol  selection,  ve  shall  obviously 
vast  to  istradace  pwtaMuitg  distribution?  c*r«r  tts?  set  of  a/nfcals.  Bcr  this 
to  sate  sense,  ve  shall  hsro  to  aagssae  that  the  source  is  bcaogeneoaa  in 
tiBSj  se  that  its  statistical  character  - aeasured  by  anj  statistical  pacra- 
iwter  ve  chocs*  - is  the  sane  at  css  tine  as  at  any  other  tine*  Snch  a source 
lo  said  to  be  stationary  and  tha  tine  series  (of  syxbol  selections)  is  called 
& stationary  tire  series.  This  assertion  Is  essential  to  the  theory;  it  is 
Co*  strfch  sssss  plsjg&ls  trr  «•»  sfwrees  and  not  far  others  (ve  snail  retara 
to  this  in  part  XI  ca  applications ) » In  asst  cases,  boagygr,  it  Is  quite 
difficult  to  assure  oneself  that  a sec roe  is  stationary;  the  probloa  is  very 
closely  related  to  the  difficulty  in  deciding  ^nether  a particular  finite  set 
at  Embers  can  be  considered  a typical  sasple  fronarasdon  sequmce.  The 
condition  •=*■«■*.  however,  to  prevent  ns  fret;  ccnsdd-risg  ss  one  source  the 
Eiev  York  TSsss  frss  ttve  3 to  tine  T and  Igyestia  tren  tine  T tc  tine  T*. 
for  the  statistical  structure  of  aesaagea  in  these  two  tine  intervals  will 
certairuy  be  different  — indeed,  sane  of  the  symbols  will  differ. 

Assundcg  a stationary  scarce  5,  ve  nay  now  icfcrodnce  a little 
rsecessaiy  notation.  Ho  let  p(±)  denote  the  probability  that  ayacal  i*S 
will  be  selected  and  p{i,  j ) the  probability  that  symbols  i and  j«3  vill  be 
selected  in  the  order  1 and  then  J.  In  genEral,  p(i,3>  / p(j,±)  ( consider, 
tar  excels,  q and  u in  I^glieh).  In  general,  if  ...,1^  is  sc  dared 

sequence  of  sytdois,  .. .,1^.  denotes  the  probability  af  its  occur- 

rence* 

The  selection  of  smtbols  Is  3 •'■id  to  be  independent  if  for  erv^ry  k 


i 


I I 


sod  OTtjy  ptmoiHe  sequence 
* • •>* 

Before  taming  to  the  snalyris  of  the  case  c f IntejModeat  telectiLocs, 
it  nay  fee  of  Interest  to  indicate  «*»  of  xao  effects  csf  Taarioas  assaes&d. 


statistical  dapeztdsacissr  Ha  shall  pressnt  tbs  output  gsssrstod  £rat  a 
scarce  stleh  takes  into  eeeoant  sane  (bat  not  all)  of  the  statistical  ttpgfagg 


of  SoglloL.  "Srst,  suppose  that  se2~rtlo«s$  ares  ii 


Et  hat  with  the 


staple  frscieearfes  of  Ttagliss  text.  Using  these  fraqmaa des  acd  & t afcl*  of 


rs2?<ios  Esaleraj 


[87)  serssrsted 


CCBO  HU  SOUR  ISHEDTIS  EtJ  IL  EEGESEBI&  SB  382  AlfiFHgngA 
OOB7TVA  H*H  BET, 

If,  hoKSTOTy  cue  adnata  latersTrafcol  inflasae,  one  may,  tat  example,  generate 
& message  in  wMca  each  select  ion  depends  an  the  two  preceding  cases.  Using 


sack  data  far  English-  Shannon  generated 

IU  2D  131  UT  USET  CBATICT  FBCRJHE  SIRS  ®OdB  PCEEEWSE  OS' 
UMOBSUMBS  OF  BUS  REPTAGEJ  IS  HnSGICflCSSl  Q?  CSS 

Sgjttiwr  message  ir  anglleh,  bat  the  ascend  is  ?Eare*  English  than  the  first 


3.4  Information  Mgrog  far  It 


Selections 


1st  xm  aasrae  for  the  presect,  that  messages  are  generated  fcgr  In- 
dependent selections  from  a discrete  scarce*  Statistical ly,  t acu,  the  scarce 
is  ceagilstsly  characterised  probability  dietribotaoB 

P * Hp&>*  ?(2),...,p(n)J  = 


1.  fhe  greater  ease  tbs  tfpis-  ffwasl  Is.  typing  *►»  ascend  passage  as  against  the 
fSi’St  is  interesting  in  ±Ms  eofmecticsi. 


■•"iMiwai 


of  aalsctico  ctk  the  c syabols  of  the  scarce  S.  The  prohlea  is  to 

assign  & ssrtwr  to  the  source;  i*:.,  to  the  prt£>ability  distribution  P, 
which  ss  £©sl  is  a suitable  measure  of  the  average  amssst  of  iafoiusatioo 
per  aynbol  in  S.  There  are  at  least  focv  ways  to  get  to  an  answer  (fortunately 
the  same  answer}*  and  since  each  reveals  txaetMag  of  the  styucturw.  of  the 
problem  and  since  the  resulting  statistic  is  of  such  greet  inportsnee,  we 
shall  pr»«st  ail  four. 

Hhat  we  vast  is  a function  which  assigns  a ndbar  to  each  probability 
dietributiouj  we  nay  denote  it  by  5 • H[p(l),  p(2)  t(a}}. 

Tin  first  procedure*  witch  Is  heuristic  and  easily  rsnenbered*  rests 

W 

oa  accepting  the  argument  of  section  I.2.1  that  vhsn  tiers  arc  a * equally 
lively  axcarsatiT as*  then  a suitable  measure  of  the  amesot  of  informtiou  is 
S ■ Icg^n,  Let  us  extend  this  definition  to  n equally  likely  alternatives 
whsr?  n is  now  any  integer,  i.e.,  »«  shall  say  there  are  log^n  bits  per 
selection  from  among  n equally  likely  selections.  Bow,  if  we  consider  aqy 
event  of  probability  p * l/n,  then  we  nay  treat  this  event  as  one  aneng  n 
equally  likely  altamatlvs?  and  so  tli  Information  involved  in  it®  selection 
is 

**2?  = ‘ “lo^ 

Finally,  ccQsidsr  an  event  of  probability  p (act  necessarily  tbs  reciprocal  of 
ec  integer):  ft  is  plausible  to  extend  the  above  definitions  further  and  to  say 
•feat  -Taggp  bits  of  tnfornf  ticm  are  truaaadttcd  by  the  occurrence  of  this 
event  of  probability  p » Tuns,  for  the  given  source  S,  the  selection  of  syribel  1, 
sMch  oce-rs  with  probability  f Ci},  trwnssito  -loggpCi)  bits  cf  infer: 


r 

* j 


\ i 


¥*  see  that  this  hes  the  wj  roaquMt  property  that  ac  occurrence  of  a 
toy  rare  sr-et  trasondts  a great  deal  of  infamstioa  and  an  great  •stth 
probability  near  1 tr^rjad-to  almost  ao  inferraatioo-.  On  the  iTafago,  bowerer, 
idle  asnst  of  inf creat?  on  transmitted  is  tbe  ezpcc rj&z  v«l as  si  * wingln  selec- 
tion froa  tbe  scarce,  I.e*, 
a 

H»  -X  p(i  )logop(i}  tite/ryricl. 

i - 1 

8»  abnee  sgeawia;  is  without  a doofat  ila  bast  known  aspect  of 
laraatKa  taeay,  md  tboc  sre  wras  to  b«jieit>  ttet  ^rfs  aqgssria 
has  blinded  earn*  to  tbe  content  of  Use  theory*  It  icj  of  coarse,  nothing 
■are  or  lees  than  a statistical  paraoeter  defined  f«r  pH  distributions 
end  esc  is  eisilar  to  tbe  -nrlsaeej  it  obtains  seaning  and  -nine  is 

only  teo  ways*  first,  as  it  is  ftrea  * r~*n?ng  in  t tiscofy,  and  second,  as 
it;  Tinrrin  a carreotieeallr  accepted  way  of  scsssrlsiag  certain  pbesaonenau 


balled  H tbe  entropy  cf  tbe  hutw  var  asre  properly  of  the  distarlboe 

— -in 


r 


su&tlstieel  Mechanics  asd  is  called  entrccy  there.  TSaere  bos  been  cmetosr- 
sble  cmtroTersy  as  to  whether  this  is  only  a formal  similarity,  or  whether 
physical  entropy  and  Inf  carnation  are  two  closely  related  phesagsaa.  This  is 
& point  requiring  esrefcl  and  sophisticated  discc^sicn  and  a rather  deeper 
'nowladce  of  j&yslc#  thsrr  we  can  assess  bee.  Certain  anthers  bare  been 
displeased  with  the  tens  'enfe-epy1  and  ther  rsre  used  teres  such  as  the 
*ass5tint  of  InTorsatrlcp f or  the  'information, 1 tbe  'opeeificity  * 


ife^i 


I 


I 

t 
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and  the  tnac«3  Lj-Inty  cf  the  eouree**  to  the  noet  pert,  bo— me, » entropy* 
is  vs&u,  sad  so  wo  shall  erplay  it  without  eoanittiag  ootkItbs  to  the  identity 
of  Isfonaatioa  sad  physical  entropy* 

The  second  procedure,  wM.ch  nangr  Teel  to  be  the  riaplwt  sod  moot 
elegant,  ancuots  to  a rigorous  formulation  of  the  first  am*  The  t^hniqae 
la  to  gists  fear  iskdtbely  acceptable  conditions  which  oast  be  set  by  * 
concept  cf  the  inf areatica  tw—ttted  vben  & sysbol  1 is  selected,  given 
that  the  a priori  prawfelSity  of  Its  selection  as  «i):  fr«a  these  coodi- 

tdor*  wo  shell  derive  the  aititg  arpres bJLoqj  they  ere: 

1*  Xrrvianraney  eeatsytlaa*  The  inforaaticc  transmitted  hr  a 
selection  of  1 shall  be  a real  s asfcwr  WuLeh  dsneade  oaly  an  p(i)  and  act 
on  the  probabilitgr  distribution  &rsr  the  other  aysfoola.  Tfeas,  we  aay  denote 
the  infcrarticp  trengsltied  by  f(p{i)}* 

2.  Contdimltr  — — paisa*  fvp(l))  shall  be  a corttcnoos  function 
of  p(i),  far  a t»?  nail  cha&g*  in  pvi;  smlc  result  in  «£ly  a sail  change 

in  the  Mwetion  bxanittedt 

3a  Additivity  aa-sns*tion*  If  two  independent  saLectiossi  and  j 
with  probabilities  p(}>  sad  p\j)  are  effected;  then  the  inf  creation  trnns- 
zdttcd  in  the  Joint  sslrsllai  Ci,J },  which  bas  probability  p(i'pU)  of  occur- 
ring should  be  tbs  staple  sub  of  the  irfarntticn  trsnssltted  by  each  of  the 

SP -i-tiO . i*9*> 

f(p{i>p(i)}  • 

Jj0  Seal*  assos^rtion*  In  our  discussion  of  the  bit,  we  said  that  a 
selection  with  probability  1/2  shall  convey  ease  bib,  so  we  assuas 


i 

I 


! 

1 

J 


I 


*0/2)  - 1. 

- x 

It  laUon  atsDf  frag  3 that  tip  ) ■ b rvp)  for  x am  x 
integer*,  mad  m by  the  aontinolty  aasxapticc  ftp*)  * xf(p}t  far  tor  real 
mto  x«  Ary  veins  of  p can  be  written  In  the  JTars  (1/2)*,  i,*.,  «e 
- legjP  * x.  Bart  by  fi'Cl/if11)  "x,  so  f(p)  * ffCl/S?1  • x * * log^># 

ffcs  expected  valss  «£  the  inform. tiac  Ignaaittsi  bjr  a source  with 
probability  disferiiwticn  p(±)  is  therefor? 

- Z pCiftog^pCi). 
i - i 

A third  Method  to  detain  tbs  otters  eaqpw ion,  xsish  is  das  to 
Shannon  [07].  is  -si-Tlar  to  tt»  lest  cae  agggpt  text  it  dsaXs  with  the  stole 
distrlbn&Lsa  as  cess*  Tts  procedure  is  to  stair  a caries  of  £©sr  a priegj 
and  is&aisiam  sssasitica*  m£cL  it  is  felt  xast  be  net  by  any  aeserare  «f  its 
crera.gr  «a*a!t  of  iaforaatlcg  per  s/ii>el  in  the  scarce. 


1*  tub 


inr  tTiidtitn  tr«5bul 1 1 elaui.  be  a rosl'-Mu* 


function  «T  tlse s sygusssis p-tl }>p( ~ J »•••»  pCnj,  vtidt  we  shall  denote  by 

l(p(l}*pC2)  ?<n)i« 

Sect,  it  seefea  reaaotMule,  as  in  the  second  np«w>df  to-  sappoce 
that  if  we  ware  to  change  the  distriboti.ee  ^siy  al.V<St2y»  thsa  5 should  also 
change  only  slightly,  so  we  require-  that 

2*  E -^sall  be  & ecstSsEWBe  fuzscticc  in  each  of  it®  a 
Farther,  appose  we  cccsider  all  scurees  for  which  the  sysbcls  are 
equally  likely,  i*e»,  p{±)  * i/n*  As  « Is  3 increased  there  is  asm  isfartsaticn 
transsitted  by  the  e'Xecticu  of  csre  syshcl  since  sere  aessages  of  a given 


-JfeSR- 
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length  ars  possible,  so  we  require 

3o  1/hen  p(i)  • l/n  for  all  1,  ifcen  H is  a raonotonically  increasing 
function  of  n. 

Finally,  we  wish  to  require  that  if  the  calculation  of  the  amount 
of  infomation  in  a source  is  divided  into  a c? ries  of  subea.1  culationa  tiwn 
the  mode  of  subdivision  shall  not  alter  its  value.  More  exact!/,  suppose  3» 
is  a subset  of  S (which  by  relabeling  we  may  always  take  to  be  the  elements 
1,2, The  set  S*  e*w.  of  course,  be  treated  as  a single  element  s’ 
with  probability  of  occurrence 

p(a’)  = p(l)  + p(2)+  ...  +p(s). 

If  H 1b  known  we  can  compute  it  for  S,  for  the  set  with  elements  a5,  s+1  .....  n, 
and  for  the  set  S'  alone.  Our  condition  asserts  that  the  first  number  shall 
be  equal  to  the  weighted  sum  of  the  last  two,  i.e., 

lu  H[p(l),  p(2),...,p(n)]  • H[p(3*  ),p(s+l),...,p(n)] 

teli,  , 

[p(s'}  p(s')  p(s")j 

From  ti  < e four  conditions,  each  of  which  seems  to  be  necessary. 
Shannon  has  shown,  in  a manner  net  unlike  that  enroloyed  in  the  second  method, 
that  H roust  be  of  the  fora 
n 

-K  2 p(i)log  p(i)s 
1*1 

If  wo  choose  the  scale  constant  to  be  1 and  if  we  take  the  logarithm  to  the 
base  2,  then  the  binary  equally  likely  case  has  a value  of  1 (as  it  should  to 
be  one  bit),  and  we  arrive  once  again  a.5>  the  entropy  expression 
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H » - EptikoggpU). 

Before  we  discuss  any  of  the  properties  of  H and  relate  it  to  the 
other  quantity  - channel  capacity  - which  we  have  defined,  we  shall  arrive 
at  the  expression  for  H from  the  fourth  point  of  view,,  The  following  argument 
is  given  by  Fano  [I4]  and  it  is  similar  to  one  by  Shannon  [8?J»  A plausible 
tray  to  compare  two  different  sources  is  to  define  a recoding  of  any  source 
wliich  takes  into  account  the  probability  distribution  of  the  source  and  which 
results  in  one  of  a set  cf  standard  normal  forms  of  sources , If  these  normal 
forms  are  such  that  we  can  assign  a number  to  each  in  an  intuitively  accept- 
able way,  then  we  have  indirectly  assigned  a number  to  each  source.  Of  course, 
the  only  sources  we  have  associated  any  numbers  tc  are  the  binary  equally 
likely  ones,  so  it  is  isox'e  than  reasonable  that  we  should  attempt  a recoding 
into  binary  equally  likely  selections. 

This  nay  fee  dome  in  the  following  manner.  Form  all  possible 
messages  of  length  r,  i.e.,  those  consisting  of  r symbols-  and  call  this 
set  R.  Since  the  selections  are  independent*  the  probability  of  each  mes- 
sage is  simply  the  product  of  the  probabilities  of  the  individual  selections 
which  iwdce  it  up,  hence  we  know  the  probability  of  each  message.  Thus  we 
have  a probability  distribution  over  R.  Divide  R into  ct  subset  and  its 
complement  ^ with  respect  to  R in  such  a manner  that  the  sum  of  "th?  prob* 
abilities  of  messages  in  is  as  n ear  1/2  as  possible.  To  each  message  in 
FL,  assign  the  digit  1 and  to  each  in  the  digit  0,  Now,  divide  R^  into  a 
subset  P.0  and  its  complement  with  respect  to  R^  (not  R).  Again  the  choice 
of  Rg  is  such  that  tbie  probability  of  messages  in  Rg  is  as  nearly  equal  as 
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possible  to  those  in  Eg*  To  those  ires  sages  in  Rg  assign  a second  digit  1, 
so  now  U is  assigned  to  each  message  in  R^o  To  those  in  Eg  assign  as  the 
second  digit  0,  so  10  is  assigned  to  each  message  of  Oarry  out  a 
similar  process  in  ^ leading  to  the  numbers  01  and  00.  Continue  this 
'probability  halving * until  the  classes  contain  single  messages.  In  this 
manner  each  massage  will  have  assigned  to  it  a sequence  of  binary  digits, 
the  length  of  the  sequence  being  in  large  part  determined  by  the  probability 
of  tne  occurrence  of  the  message  - the  more  probable  messages  having  fewer 
digits* 

An  example  may  make  the  process  clearer: 


Message 


A 

B 

C 

0 

v 

*4 

F 


Probability  first-  second  third  fourth 
of  occurrence  digit  digit  digit  digit 

o„5o  1 

0,13  0 11- 

0*12  0 10- 

0,12  0 0 1 - 

0,06  0 0 0 1 

0.07  0 0 0 0 


Wm  first  division  is  between  £ and  VE5C,D,E,F^  . No  further  division 
of  \A)  is  possible,  aud  the  other  set  vs  divided  into  -f  B.C^  and  {d,E,?^ 
These  in  tarn  vere  divided  as  {b}  and  (Cj  and  as  {d}  and  {e,?|  • 

The  final  division  is  of  into  {F J and  {f|  . 

Such  a coding  as  this  is  efficient  in  the  sense  that  the  fewest 
number  of  binary  digits  is  assigned  to  the  most  probable  message  and  the 
largest  number  to  the  least  pro cable  ones.  Now,  one  can  ask  how  many  binary 
dig:ts  are  required  on  the  average  per  symbol  when  awesagee  of  length  r are 
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consldered9  That  is,  for  each  massage  we  multiply  the  number  of  digits 
required  by  the  probability  that  the  message  occurs,  sum  these  products  over 
all  Messages,  and  divide  the  sum  by  the  total  number  of  eymboio  r in  the 
message.  Call  this  number  H • In  the  above  example  Hr  - 2.13/r  bits  per 


symbol.  The  lim  H 


is  a number  assigned  to  *>?. ch  discrete  source  which 


r->«o 

both  has  a plausible  meaning  and  will  serve  to  compare  different  sources. 


Fortunately,  it  can  be  shown  that 

H - limHr  - - 2p(i)log2p(i). 

Thus  Lt  four  (really  only  three)  routes  we  have  come  to  the  same 
statistic  as  the  on a which  is  appropriate  to  describe  the  average  nature  of 
the  source-  We  can  defend  it  in  two  further  wayuj  first,  by  stating  soma 
of  its  properties  and  by  showing  that  they  are  reasonable  Vff  a measure  of 
information,  and  second,  by  using  it  to  make  theoretical  statements  about 
th?  transmission  of  information. 

3.5  Properties  of  H 

A numoer  of  theorems  about  H may  be  proved  IU7Jj  as  we  shall  need 
them  later,  and  as  they  help  1b  give  a feel  for  H,  we  shall  state  them. 

i.  H > C,  "d  n - 0 if  and  only  Lf  all  p(i)  except  one  equal  zero. 
In  other  words,  the  entropy  of  a distribution  is  always  non-negative,  and  it 
is  zero  if  nnd  only  if  the  selection  of  one  symbol  is  certain.  Intuitively, 
no  information  is  conveyed  when  the  selection  is  certain,  and  accordingly 
H - 0, 

ii-  Tlie  r-^d.r  vm  value  of  K is  log,,n  and  this  maximum  la  achieved 


9 

1 

3 


I 

I 


-32- 

when  and  only  whan  each  p(i)  «*  l/n. 

In  wards,  the  maximum  average  information  transmitted  per  symbol  is  log^n 
and  that  maximum  occurs  when  and  only  when  each  of  the  symbols  is  equally 
likely# 

It  is  the  above  two  results  which  have  led  many  authors  to  speak 
of  II  55  the  uncertainty  of  the  source,  for  II  has  ?.ts  maximum  when  what  we 
think  of  at  uncertainty  is  a maximum  and  its  minimum  when  absolute;  certainty 
obtains# 

iii#  Let  any  long  message  of  II  symbols  be  selected  and  suppose  it 

has  probability  p of  occurring,  then  - log^p  is  an  estimator  of  H. 

This  last  result  is,  of  course,  of  considerable  importance  in  estimating 

Han  practical  situations,  since  in  general  all  tnat  can  be  observed  Is 

one  mas  sage  of  some  long  duration#  It  must  be  pointed  out-  that  i?hen  this 

result  is  given  in  precise  mathematical  lingua ge , it  asserts  that  log^p 

N 

almost  certainly  approaches  H as  II  approaches  infinity,  i#e.,  the  estima- 
tion scheme  is  consistent# 

3.6  Non-independent  Selections 

So  far  our  discussion  of  the  source  has  been  restricted  to  the 
Independent  case,  which,  as  we  pointed  out,  docs  not  include  most  sources. 
Cut  cur  efforts  will  net.  bo  lest,  for  fortunately  wc  can  readily  carry  over 
the  results  for  independent  sources  to  the  non-independent  case# 

lie  shall  consider  the  selection  of  cne  symbol  from  the  set 
S * f 1,2, foilcved  by  a second  selection  (possibly  the  next  one  in 
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foraing  a message,  but  we  do  not  need  to  restrict  ourselves  to  that  case).  I tore 
formally,  vs  1st  x sad  j ba  randce  variables  with  range  3„  The  joint  distribu- 
tion of  x said  7 is  known  and  we  shall,  for  convenience,  denote  the  probability 
that  x » i *nd  y * j by  p(i,j)e  In  general,  of  course,  p(i,j)  f p(i)p(j)  since 
the  selections  need  not  be  independent.  The  distribution  p(i,j)  is  now  defined 
ever  the  product  space  3xS,  but  this  differs  in  notation  only  from  the  arbitrary 
source  we  have  considered  earlier,  and  so  our  definition  of  entropy  can  be 
applied  without  alteration  tc  the  distribution  pvi,l )„  So  vo  have  as  the 
entropy  of  the  joint  distribution  of  x,y, 

H(x,y)  » - 2 pCi, j jloggPCi, j }. 

i,J 

Similarly,  the  definition  can  be  applied  to  the  distribution  of  tbs  random  variable 
x alone  and  to  that  of  y alone,  and  so  we  have 

H(x)  - - 2 p(i,j)log  E p(i,j) 
i,j  2 } 


( 


and 


■ - 2 pCiJloggpU) 

H(r)  - - ^ p(i.j)ioe„  ip(l(J) 


" - 2 pCjJlog^pCj) 

V 

where  p(i)  * 2 p(i,j)  *e<d  p(j)  “ 2 p(i,j)« 

J i 

From  these  definitions  Shannon  [871  noted  the  following  theorems 
H(x,y)  < H(x)  ♦ H(y;0 

This  result  simply  states  that  the  intuitively  desirable  requirement  that  the 


entropy  (or  uncertainty  or  information  transmitted}  of  the  Joint  distribution 
be  no  more  than  the  sum  of  the  entropies  in  the  Wo  distributions  separately 
is  fulfilled.  In  addition*  Shannon  showed  that 

H(x,y)  » H(x)  * H(y)  if  the  events  x and  y are  independent,  thus, 
whenever  there  is  any  iniernyrabol  influence  in  thrt  selections.  Lias  information 
is  transmitted  per  symbol  than  if  they  had  bean  independent. 

If  we  introduce  the  conditional  probabilities  relating  the  distribu- 
tion of  y to  that  of  x,  further  relationships  of  interest  can  be  established. 
We  let  p(j|i)  denote  the  conditional  probability  that  y “ J given  that  x - i, 

PCJID  - p(i'1)  . 

Fpunr 

J 

Tbs  conditional  entropy  of  the  random  variable  y given  that  x « i is  defined  to 


H(yjx  « i)  ■ - Z p(jii)log„p(j ji). 

5 

Hence  the  axpooted  conditional  en^vropy  of  the  random  variable  y given  x is 
Hx(y)  - - 2 p(i)  Z pCjiDloggpfjli) 

- - z pdjjjiog^ptjli). 

i,J 

H^(y  1 measures  the  average  uncertainty  in  the  selection  represented  by  y after 
the  selection  denoted  by  x is  known. 

Shannon  has  shown  that" 


le  This  result  is  readily  proved: 


H(x)  + H (y)  • - Z p(i,j)log,,  2 p(i,j)  - Z p(i,j)lpg ’?(;* |i) 


f 

i 


“ - ? p(l,j)log2[  Z p(i,j)J  p(j|i) 

*IJ  w 

■ - z p(i, J )3.ogjp(i, j ) = H(x,y)o 
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K(x,y)  - H(x)  ♦ Hx(y), 

which,  in  words,  states  that  the  uncertainty  of  the  joint  distribution  is  equal 
to  the  uncertainty  of  the  distribution  of  x.  added  to  tie  uncertainty  of  that  of 
y when  the  value  of  s is  knawn,  Fi*ara  this  asd  the  preceding  result,  the  follow- 
ing corollary  is  readily  sees,  to  hold! 
n(y)  > ^(y), 

l.e„,  the  uncertainty  of  the  distribution  of  y is  never  increased  by  a knowledge 
of  x.  The  two  are  equal  If  and  only  if  the  two  random  variables  are  independent 0 
One  final  concepts  the  ratio  of  the  entropy  of  a source  to  the 
maximum  entropy  possible  with  the  sane  set  of  symbols  is  a measure  of  tbs  informa- 
tion transmitting  efficiency  of  the  source  - Shannon  called  it  the  relative 
entropy.  It  is  generally  less  than  one,  either  because  there  is  a non-uniform 
distribution  over  the  symbols  or  because  of  the  uoa-iadsperdeuce  of  syrboi 
selection  or,  most  commonly,  because  of  both,  Otoe  minus  this  quantity 
indicates  the  percentage  of  symbols  which,  though  sent,  car.y  no  information. 


i„e.,  which  are  redundant*  Thus  we  define  the  rodnad-ggy  of  a source  to  be 

H _ _5 

log^:  1 


1 - 


max 


TI 


- 1 


Several  estimation  procedure-  indicate  that  the  redundancy  of  written  English 
is  at  least  SO  per  cent  ;«d  verr  likely  nearer  75  per  cent,  (see  section  lie  2.1). 
The  reason  for  such  high  redundancy  will  become  apparent  later. 


c: 


3, ,7  The  funds™*™*” I Theorem  of  a Noiseless  System 

The  result  we  shall  state  in  this  section,  which  is  due  to  Shannon 
[87],  shows  in  effect  that  the  above  definition  of  channel  capacity  and  of 
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source  entropy  or  uncertainty  are  suitable  formalisations  of  our  intuitions 
about  tbs  limitations  on  information  transmission. 

Th«or«nii  Let  the  entropy  of  a source  be  H bits  per  symbol  and  the  capacity 

of  a nolseleas  channel  be  C bits  per  second.  For  any  positive  number  t,  no 

matter  how  small,  there  exists  a coding  of  the  output  of  the  source,  i *ec > 

there  exists  a transnltter  such  that  it  Is  possible  to  trensmlt  at  an  arverage 
(• 

rate  of  - 6 symbols  per  second,  It  is  not  possible  to  devise  a code  so  as 

to  transmit  at  an  average  rate  of  mogg  than  C/n  symbol a per  second. 

There  arts  three  points  vhich  should  b«  mads  about  this  theorem,, 
First,  it  mist  be  kept  in  mind  that  the  definition  of  the  entropy  of  a source 
reeta  only  on  the  statistical  structure  ef  the  source,  and  it  does  not  in  say 


way  depend  or-  the  prspartiee 


tssnel*  Also-  the  capacity  of  the  channel 


depends  only  on  channel  r,ronerr+jss  aafl  not  at  all  on  the  source.  The  theorem 
asserts  that  these  definitions  have,  however,  been  so  chosen  that  the  ratio 
C/H  is  the  least  upper  bauud  cf  the  transmission  rate. 

Second,  the  code  which  the  theorem  asserts  to  exist  is-  of  course, 
influenced  by  how  snail  we  take  e.  If  s is  near  C/R  then  nearly  any  code  will 

Q 

do,  but  as  e approaches  0 fewer  and  fewer  codes  will  produce  a rate  of  ^ =■  s. 

But  the  theorem  asserts  that  there  will  always  be  at  least  one.  A maior  unsolved 
problem  of  information  theory  is  to  devise  a theorem  which  describes  such  a 
code  in  detail  for  given  values  of  C,  II,  and  sj  the  above  theorem  only  asserts 
that  such  a coda  exists. 

Third,  such  optimal  use  of  the  channel  as  described  in  the  theorem 


""•■■WiSBJIji 


is  not  effected  without  pa/ing  scute  price-  The  price  Is  delay . II'  one  is 
to  code  a message  optimally  whan  there  are  intersynbol  influences,  then  it 
is  necessary  to  vail  before  transmission  to  see  what  that  influence  is  and  to 
make  use  of  it  in  the  coding,  thus  effecting  a delay  in  the  transmission. 
Similarly,  at  tho  receiver,  the  translation  into  the  language  of  btao  destina- 
tion must  be  delayed  in  exactly  the  same  way,  for  a single  received  symbol 
will  have  meaning  only  by  its  relation  to  a number  of  others.  In  practical 
engineering  work  a compromise  is  reached  between  long  delays  (and  hence 
expensive  storage  equipment)  and  less  than  optimal  use  of  the  channel. 

The  theorem  may  be  revest  in  a slightly  different  form,  which  may 
help  clarify  it  and  which  will  be  useful  when  we  study  the  noisy  system.  Let 
H denote  the  average  rate  at  which  syrfcols  are  trwwmibta'd  over  the  channel 

c 

when  a given  code  is  used.  The  theorem  then  assarts  that  w > R and  that  there 

u — 

exist  codes  such  that  the  corre spending  R is  arbitrarily  close  to  C/R.  If  we 
rewrite  this  as  C > HR  and  then  maximise  both  "Idas  vlth  respect  to  all  possible 
codes  we  have 

G • max  C « max  (HR), 
codas  codes 

It  is  conventional,  though  misleading,  sirpiy  to  replace  IS  in  the  above 
expression  by  H.  Previously,  the  entropy  of  a s cures  was  measured  in  'bits  pea* 
symbol,  * but  for  this  purpose  wc  measure  the  entropy  of  tho  source  (and  transmit- 
ter combination)  in  'bits  per  symbol’  times  “cysbsls  per  second, ' i.e.,  in 
’bits  per  second’  transmitted.  The  theorem  than  assorts  that  the  channel 


• capacity  is  equal  to  the  miudmum  number  of  bits  par  second  which  can  be  transmit- 
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tad  by  the  s ourc e - transmittor  combination  over  the  cnrnnel » it  la  this  icon, 
and  the  corresponding  form  for  noisy  systems,  in  which  the  fundamental  theorem 
has  been  used  in  behavioral  applications. 

li9  The  Discrete  Noisy  System 

As  in  the  preceding  section  we  shall  suppose  that  the  source  is 
discrete,  but  we  shall  drop  the  condition  of  a noiseless  system, 
hoi  Brai-gcstion  and  Channel  Capacity 

The  significant  effect  of  noise  in  a system,  as  we  pointed  out  In 
section  1.2.2,  is  to  causa  the  destination  to  believe  sometimes  that  a different 

1 

symbol  was  transmitted  from  that  which  actually  was.  Any  other  properties 
the  noise  may  hare  are  irrelevant  to  this  theory  of  information  transmission. 

Thus,  if  vc  assume  that  both  the  signal  and  ths  noise  time  eerie*  are  stationary, 
then  the  noise  is  completely  characterised  by  the  matrix  of  conditional  prob- 
abilities p(j|i;  which  state  the  prOoibilihy  that  synbol  j is  received  when  i 
was  sent.  Formally,  this  situation  is  identical  to  the  case  of  non^isdspeudsnt 
selections*  in  that  case  we  interpreted  j .'is  a selection  following  i-  here 
wp  shall  interpret  £ as  the  selection  received  at  the  destination  when  i was 
actually  salaried  at  the  source.  ■ 

The  quantities  H(x),  H(y),  H(x,y),  ar.d  H^Cy)  are  defined  as  before. 
n(>:)  is  the  entropy  of  the  source  distrJJmtion,  H(y)  the  entropy  of  the  destina- 

| 

tiem  distribution,  H(x,y ) the  entropy  of  the  ,1oini;  distribution  of  x and  y, 

_ B (x>  measures  the  average  ambiguity  in  the  signal  sent  given  the  received 

O y 


iM  MMJmmi** rniww»—— - — 


i 


•39- 


signal,  while  ^(y)  snosures  the  average  anb igeity  of  the  •■ecaived  eignal. 
When  vs  are  considering  noise,  Hy(x)  is  called  the  equivocation* 

If  a system  is  noiseless,  then  H (y)  » 0 * H (x ) and  so  H(z)  » H(y). 

A Jr 

Let  us  suppose  tb*+-  all  thr.  entropies  are  calculated  in  bits/oec, 
rather  than  bits/symbol,  then  the  effective  average  rate  of  transmission,  R, 
(in  bits/sec } is  olio  avsrcsn  rate  of  information  sent,  n(x),  minus  that  which 
was  lost  as  s.  result  of  the  noise,  H (x): 

R * H(x)  - Hy(x). 

This  can  easily  be  shown  to  be  equal  tc  two  other  expressions,  tbs  first  of 
which  states  that  the  rate  of  transnd  *ision  is  the  difference  between  what 
was  received  and  what  was  received  Incorrectly.  In  symbols, 

R * H(y)  - Hx(y) 

S(x)  + H(y)  - H(x,y). 

The  nation  of  rate  of  transmission  for  the  noisy  case  is  analogous 
to  that  introduced  for  the  noiseless  case  in  the  last  statement  of  the  funda- 
mental theorem  of  the  noiseless  case  (section  1*3*7),  and  it  suggests  that  one 
way  to  define  channel  capacity  in  the  noisy  csss  is  ss  fellows t 

C • Bax  [3(z)  - H^x)  ]• 
codes 

By  the  theorem  of  section  1*3*7,  this  definition  reduces  to  that  of  channel 
capacity  in  the  roiselesa  case  since  H^(x)  ■ 0.  However,  it  does  not 
redt;ice  directly  to  the  definition  of  channel  capacity  as  given  is  section 
1,3.1*  but  at  the  end  of  the  next  section  we  shall  present  a thsoreo  which 
shows  that  there  is  «£i  raalogcus,  though  more  son^licated,  dsfinition  for  the 
noisy  case* 
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Figure  L 

Thera  is  assumed  to  be  an  observer  who  is  able  to  perceive  both  the  selections 
ngda  fcy  the  source  and  the  corresponding  signals  received  at  the  receiver. 

Let  us  suppose  that  .he  equivocation  due  to  noise  is  H^(x),  then  if  there  is 
a noiseless  correction  channel  from  the  observer  to  the  destination  with 


capacity  H (x)  bits /sec,  it  can  be  shown  [87]  that  it  is  possible  to  encode 

y 

correction  data  in  such  a manner  as  to  correct  all  but  an  arbitrarily  small 
fraction  of  the  errors  due  to  the  noise.  This  is  impossible  to  do  if  the 
channel  capacity  of  the  correction  channel  is  Isso  than  H^(x ) . While  this  theorem 
is  of  some  theoretical  interest,  it  is  certainly  not  a practical  scheme  to 
combat  noise 0 We  turn,  therefore,  to  a consideration  of  cv^ucicatictn  systems 
in  the  sense  of  section  I.  2.1. 

The  following  result,  dus  to  Shannon  [87],  is  the  fundamental 
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theorera  of  the  noisy  ease: 

'Theorem:  Let  the  entgcpy  of  a source  be  H bits  per  seeond  and  the  capacity 

of  the  channel  C bits  per  second.  If  il  < C,  then  there  exists  a coding  scheme 
such  that  the  output  of  tlis  source  can  be  transmitted  over  the  channel  with 

■ .r  II  II  • green  «ww  vbrmmm  mmmmrn  — — 11  I* 

an  arbitrarily  small  frequency  of  errors.  If  H > oy  it  is  possible  to  reduce 
the  equivocation  to  as  scar  H - C as  one  chooses,  but  it  is  not  possible  to 
reduce  It  below  H - C„ 

— — — m mimrm 

McIUlLan's  comments  on  tills  result  seem  to  be  worth  repetition: 
"jigirieering  experience  has  been  that  the  presence  in  the  channel 
of  perturbations  noise,  in  the  engineer's  language,  always  degrades  the  uractl- 
tude  of  transmission.  [Tbs  theorem]  above  lease  ua  to  expect  that  this  need 
not  always  be  the  ease;  that  perfect  transmission  can  sometimes  be  achieved 
in  spite  o?  noise * This  practical  eonelnslon  runs  so  counter  to  naive  experience 
that  it  has  been  publicly  challenged  on  occasion.  What  is  overlooked  by  the 
challengers  is.  of  course,  that  'perfect  transmission'  is  here  defined  quanti- 
tatively in  terms  of  the  capabilities  of  the  channel  or  asdium-  perfection 
can  be  possible  only  when  transmission  proceeds  at  a slow  enough  rate.  When 
it  is  pointed  out  that  merely  by  repeating  each  message  sufficiently  often  ana 
can  ichisrvs  virtually  perfect-  transmission  at  a very  slew  rate,  the  challenger 
usually  withdraws.  In  doing  so,  however,  he  is  again  misled,  for  in  most  cases 
tho  device  of  repeating  messages  for  accuracy  docs  not  by  any  means  exploit 
the  actual  capacity  of  the  channel 0 

"Historically,  engineers  have  always  faced  the  problem  of  bulk  in 
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their  messages,  that  Is,  the  problem  of  transmitting  rapidly  or  efficiently 
in  order  to  make  a given  facility  as  useful  as  possible:  The  problem  of  noise 
has  also  plagued  them,  and  in  many  contexts  it  \a\ a realised  that  some  kind  of 
exchange  was  possible,  far  exanyle,  noise  could  be  eliminated  by  nlcwer  or 
less  ' efficient 1 transaianica.  Shannon's  theorem  has  given  a general  and 
precise  statement  of  the  asynptotl*  nmnrcr  ir.  which  this  exchange  takes 
place,"  [p.  207,  63] 

He  goes  on  to  point  cut  the  similarity  in  the  exchange  between 

VinVir  y^Toa  aw}  ••fltHmy  ffomsyc  T aynhsmwn  oasml a gJ«A  nw/< 

in  statistical  teats* 

While  Ihe  simple  repetition  of  a message  is  not  generally  a suitable 

way  to  use  the  channel  capacity  to  eliminate  errors,  some  form  of  redundant 

1 

transmission  is  required.  In  general  it  will  bs  far  mors  c explicated  than 
repetition,  but  as  with  repetition  a delay  in  the  reception  of  a massage  must 
result.  The  essential  point  of  the  theorem  is  that  the  delay  need  not  be  such 
’is  to  reduce  the  rate  of  transmission  to  sere,  as  night  bo  thought  to  be  tij 
esse.  The  proof  of  the  theorem  is  not  constructive  and  so  there  is  no  indica- 
tion of  what  code  to  use  in  order  to  utilise  the  channel  capacity  fully. 
Shannon  writes,  "rrchcbly  this  is  no  accident  but  is  related  to  the  difficulty 
of  giving  as  explicit  construction  for  a good  approximation  to  a random 
sequence.”  ip.  U3,  83]  Koch  recent  (engineering)  work  in  information  theory 
has  been  devoted  to  finding  near  optimal  cedes  for  certain  important  special 


cases. 


The  fundamental  theorem  of  the  noisy  case  may  be  recast  in  a fora 


which  shows  the  relation  of  tbs  present  definition  of  capacity  to  that  given 
for  the  noiseless  ease*  Let  q be  a number  such  that  0 < q < 1-  Consider 
all  possible  signals  of  duration  T time  units  which  might  be  transmitted  over 
the  channel  and  let  S denote  a typical  subset  of  these  signals « Under  the 
assumption  that  each  signal  of  S is  equally  probable,  let  & receiver  be  designed 
which  is  to  select  from  S the  most  probable  element  as  the  cause  of  the  signal 
it  receives*  It  is  clear  that  in  general  errors  will  be  made,  let  p(S)  denote 
the  probability  that  an  incorrect  interpretation  will  b*;  made  when  the  subset 
is  S*  Consider  now  all  those  subsets  S such  that  p(S)  < q.  Among  these  seta 
there  is  one  which  contains  the  most  signals,  let  that  number  be  denoted  by 
h(T,q)-  Shannon  [87]  then  showed  that 


Urn 


loggTivijq) 


which  ia  clearly  analogous  to  the  original  definition  of  chancel  capacity  far 


the  noiseless  case*  It  is  remarkable  that  this  result  is  independent  of  the 
value  of  q.  Presumably,  however,  the  ox  couVoT oz  wss  3-SS  aox, 

Independent  of  q,  end  so  in  any  application  of  the  theorem  attempts  should  be 
wide-  to  exploit  the  freedom  in  choosirg  qe 


lt*3  Channel  Cs 


of  a Noisv  S- 


at  Selections 


Shannon  showed  that  if  one  assumes  that  the  selections  at  the  source 
are  independent,  then  the  capacity  of  the  charnel  is  given  by  the  traascendental 
equation 


SHiihkte£^ 


-air* 


| h(j|i)  [C  ♦ Jp(j!i)log2p(jli)] 

3 

where  h(j|i)  is  a typical  elenent  of  the  inverse  of  the  noise  matrix,  i.o., 

2 h(lSl>?C;j|k)  - 6 

3 .Jc. 

It  is  difficult,  if  not  impossible,  to  see  the  dependence  of  channel  capacity 
on  the  noise  matrix  tecs  this  expiation,  but,  of  course,  in  any  given  case  one 
can  solve  numerically  for  C,  However,  if  ve  can  assume  +hat  the  noise  has  tbs 
same  disturbing  effect  an  each  symbol  of  the  source,  i.e»,  if 

r.  p(3li)ios2P(3!i)  ■ 2 pOlkJioggP^Jk} 

J J 

for  all  1 and  k.  then  it  oim  be  sham  [35]  that 

C & log^n  - E(y|x). 

In  the  special  ease  of  a binary  source  (twi  alaassts)  and  noise 
such  that  the  probability  of  an  erroneous  transmission  Is  a,  then  the  capacity 
is  gi  ven-  by 

C - 1 ♦ a log^a  (l-ajloggvi-a/o 

It  is  easy  to  make  interesting  calculations  using  this  last  eapr«w*iic><»*  Far 
eznople,  if  we  take  the  probability  of  making  an  error  to  be  1 per  cent,  then 
the  cluymAl  capacity  Is  reduced  to  approximate!?’  90  per  cent  9 f its  value  in 
the.  absence  of  noise  „ This  narked  non-linearity  must  be  kept  in  mind  whenever 
thinking  about  the  effects  of  noise* 
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5«  Setae  Aspects  of  Discrete  Theory  Related  to  Applications 


5.1  Inverse  Probabilities , Bgyea  Theorem.  Coe 


Tables 


As  we  shall  soe  in  some  detail  in  Part  H,many  of  the  applications 
of  information  theory  in  psychology  are  to  problems  not  classically  described 
£ia  conrnmication  problems.  Indeed,  they  arc  communication  problems  only  in 
the  sense  that  any  experiment,  or  any  decision,  can  be  treated  as  a transmission 
of  information.  It  is  probably  more  fruitful  to  remark  that  in  the  attempt  to 
analyse  communication  systems  a mathematical  formalism  has  been  produced  which 
COS  bu  completely  divorced  from  its  realisation  as  a ccmmunicatior.  ays  tern.  At. 
the  same  time,  them  are  other  realisations  of  the  same  mathematical  system 
in  psychology.  Because  of  it®  origin^,  however,  the  information  terminolegy 
is  associated  with  the  mathematics  and  so  vit-h  any  applications  which  are  made. 
Some  ci*  this  vcesbulcry  nay  seem  peculiar  in  the  applications,  but  it  is 
probably  not  as  misleading  as  it  may  initially  eeemo  In  this  section,  we  propose 
to  discuss,  but  divorced  from  the  ecmmunicatioa  model,  a part  of  the  formalism 
which  hae  beer-  particularly  important  in  psychological  applications . We  shall 
relate  the  rate  of  information  transmission  to  Bayes  thearem,  vre  shall  generalise 
the  notion  of  rate  of  transmission,  and  we  shall  discuss  the  statistical  sampling 
and  significance  problems  0 

Tna  structure  of  very  many  problems  in  psychology  and  in  the  other 
behavioral  sciences  reduces  to  the  existence  of  two  classes  of  possible  occur- 
rences, often  called  stimuli  and  responses,  such  that  an  occurrence  in  the 
response  class  is  in  some  degree  dependent  on  what  stimuli  occurred.  It  is 
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net.  easy  to  characterize  in  a useful  and  simple  way  the  relation  between  these 
two  classes  of  occurrences*  It  is,  of  course.,  possible  to  present  the  whole 
matrix  of  joint  probabilities  p(l«j);  i.e.,  to  give  the  entire  contingency 
table,  but  this  csn  hardly  be  called  simple*  Various  measures  of  contingency 
have  been  proposed  and  used,  but  objections  have  been  raised  to  these*  Another 
possibility,  and  one  which  certainly  has  found  favor  among  some  psychologist  a, 
it  the  entropy  measure*  The  expression  most  often  used  is 
f*  - H{x)  ♦ H(y)  - H(x,y), 

which,  when  the  entropies  are  measured  in  bits/sec,  was  called  the  rate  of 
information  transmission  (section  I.li.l}*  I tare  often  than  net  In  the  psycho- 
logical applications  time  does  not  enter  in  a natural  manner  and  it  is  mors 
appropriate  to  treat  the  stiEuli  and  the  responses  as  static  and  to  measure 
entropies  in  bits*  In  that  case  the  following  notation  is  employed: 

T(xjy)  - H(x)  ii(y)  - Il(x,y) 

- H(x)  - Hy(x) 

- H(y)  - Hx(y), 

and  the  quantity  T(xjy)  is  sin ply  called  the  Information  transmitted  from  the 
stimulus  to  the  response  o It  is  a quantity  which  i a 0 when  the  random  variables 
z.  and  y are  statistically  independent  and  it  13  a maximum  when  they  are  in 
one-to-one  correspondence,  i.e*,  when  a knowledge  of  the  value  of  x uniquely 
determines  the  value  of  y*  In  other  words , T is  a measure  of  the  contingency 
between  x and  y0 

Note  that  in  this  isi^'pretatiori  of  thy  formalism  the  role  of  the 
humsn  being  h==  changed*  Previously,  we  tiad  thought  of  the  source  and  the 
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destination  as  people  and  the  channel  as  a pbysiciil  entity*  In  most  behavioral 
applications,  the  stimuli  correspond  to  the  source  and  the  responses  to  the 
destination;  the  subject  is  treated  as  a noisy  cliannel  causing  less  tfcuin 
perfect  correspondence  between  the  stimuli  and  him  responses „ 

Another  way  to  tiiink  about  the  problem  of  the  relation  between  the 
two  random  variables  x and  y is  in.  terms  of  reconntA*ueting  the  va7.ua  of  z as 
well  as  possible  from  a knowledge  of  the  -?slue  o'  y This  is,  of  course,  the 
problem  of  inverse  probabilities  which  has  had  a long  history  in  statistical 
theory,  and  Bayes  theorem  is  one  of  the  most  famous  results c We  may  think 
of  it  ir.  the  following  form:  fnere  are  n possible  underlying  state 3 of  nature, 

i » 1,2 which  are  known  a' priori  to  have  r_  probability  p(i)  of  occurring* 
We  suppose  an  experiment  is  performed  with  possible  outcoriS  s j * 1,2, .oo,m9 
whose  outcome  depends  somewhat  on  which  stats  cbiudaso  Let  x be  a random 
variable  with  range  the  states  of  nature  and  distributed  according  to  p(i) 
and  y a random  variable  with  range  the  experfuwaitrJL  outcoaisso  Further,  let  ;ts 
assume  aa  known  the  conditional  probabilities,  p(j|i),  that  y ~ j when  x » ir 
Ths  problem  then  is  to  estimate  the  probability  cc  - i when  the  outcome  of 
the  experiment  is  'mown,  i„e«,  when  7 a .1  is  given-; 

Cherry  describes  the  analogy  to  tho  noisy  communication  system  as 
*•0.  an  observer  receives  the  distorted  output  sign-uls  (the  posterior  dataoac) 
from  which  he  attests  to  j'eccrstruct  the  input  stguals  (who  hypotheses), 
knowing  only  the  language  statistics  (the  prior  tlF«r.),n  fp*  39,  B] 

It  is  sell  known  tiiat  Bayes  theorem  roi-.es 

pdij;  - . 


£p.  39,  B] 
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If  one  takas  logarithms  on  both  sides  of  this  equation,  multiplies  the  result 
by  p(l,j),  and  then  sums  on  both  i and  j,  the  result  is  simply 

H(x)  - Hy(x)  - H(y)  - E^(y), 
i«e»,  the  information  transmitted  from  z to  y. 

?02  Multivariate  Theory 

Suppose  we  are  analyzing  a stimulus-response  situation  by  information 
theoretical  techniq«=> , then  the  basic  equation  we  have  developed. 

H(y)  - Hx(y)  + T(xjy), 

decomposes  an  average  measure  of  the  response  pattern  into  one  part,  T(xjy), 
which  is  determined  by  the  stimulus  plus  another,  Kx(y),  which  is  unexplained 
^rxndom,  variation®  But  it  «w  ww  well  happen  that  a considerable  portion 
of  the  residue  Hx(y)  can  be  explained  in  a systematic  manner,  though  not  in 
terms  of  the  experimental  stimulus  which  has  so  far  bean  considered®  For 
example,  consider  an  :3aq>eri=ssnt  in  which  sub  jects  are  required  to  classify 
tones  which  are  very  near  threshold  into  one  of  a categories  o it  may  very 
well  happen  that  the  subject's  response  is  only  determined  in  small  part  by 
the  tone  presented,  but  that  in  large  part  it  is  px-ediu table  fresi  a knowledge 
of  his  previous  response,  even  if  we  do  not  know  the  stimulus®  In  such  a case, 
it  may  be  not  only  appropriate  but  cssenti.il  that  we  consider  os  the  stimulus 
the  pair  of  random  variables  (u,v)  where  u has  the  possible  tones  as  its  range 
and  v the  possible  previous*  response  of  the  subject o In  other  words,  in  some 
cases  we  may  be  able  to  ~3>'isrsb?od  tne  phenomena  adequately  only  if  vs  treat 
as  the  stimulus  a random  variable  with  a range  winch  is  the  product  space  of 
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two,  or  more,  simpler  sets0  KtcOill  [61,  62]  has  examined  this  problem  in  some 
detail  and  he  has  appropriately  generalized  the  transmission  concepts  so  as  to 
produce  a multivariate  theory  where.  of  course,  Shannon’s  theory  is  the  bi- 
variate case<>  Wa  shall  recount  this  development  briefly# 

First  of  all,  we  nay  replace  x by  the  symbol  (u,v),  which  is  equi- 
valent to  x when  the  range  of  x is  the  protract  space  of  the  ranges  of  the  random 
variables  u and  v,  ia  the  equation  for  information  transmission,  and  we  obtain 
T(u,vj  y)  * H(n,v)  + H(y)  - H(u,vt  y), 

(V/e  have  systematically  omitted  the  extra  parentheses  about  u,  v for  greater 
clarity. ) It  is  clear  that  in  our  discussion  there  has  been  no  noti.ai  of 
direction  of  transmission  between  source  and  receiver,  and  so  tiny  jusy  be 
interchanged;  or  formally 

T(u«vj  y)  - T(yj  u^v). 

Next,  we  would  like  to  Introduce  a measure  which  gives  the  separata 
dependence  of  y on  u and  on  v,  To  do  this  it  seems  appropriate  to  define  a 
measure  of  the  conditional  information  transmitted,  which,  fer  example;  is  the 
information  transmitted  from  the  stimulus  u to  the  response  y when  the  stimulus 
v is  lueld  constant.  This,  of  course,  will  be  an  average  quantity  which  in 
detail  is  the  information  transmitted  from  u to  y computed  for  each  possible 
value  of  v and  then  averaged  over  v.  This  can  be  {shown  to  bo  given  by 
T^ujy)  ■ H(v)  - H(u,v)  - H(y,j  ) * H(u,v,y), 

In  like  manner, 

Tu(vjy)  - H(u)  - H(u,v)  - H(u,y)  ♦ H(u,v,y) 

T_UjvO  ” H(y)  - H(u,y)  - H(v,y)  •*  H(u,v,y), 

7 
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Clearly,  ▼ will  bare  an  affect  oo  the  transmission  tvcsa.  u to  7 if  and  only 


if  T^.(ujv)  j#  T(u}y)j  and  tho  magnitude  of  this  effect  is  measured  by 
A (wry)  - TT(ujy)  - T(ujy). 


Similar  quantities  can  be  defined  to  measure  the  effect  of  u on  the  trass= 


mission  tvou  ▼ to  7 and  of  y on  the  transmission  from  u to  tf.  There  is  not. 


hewevsf,  any  need  to  introduce  a new  symbol  far  each  of  these  since  they  can 


all  be  shown  to  be  equal,  i.e«. 


A(wry)  « T./viy)  - T(vjy) 


T (ujc)  - ?(ujv)» 

y 


"In  vissr  of  this  symmetry,  «s  iaay  mall  i'v(uvy)  ths  u^Tsy  istersstiem 


inxofwnation.  Ws  nee  that  A(wry)  is  the  gain  (or  loss)  in  sample  Information 
transmitted  between  any  te  of  the  variableso"  [p.5«  6 2] 


With  these  concepts,  it  is  now  possible  to  express  the  three- 


dimensional  iufviwrttlUl  ttrSuoiuittid  lu  mtiu  uf  tue  wiu  w»u-i'ilSSriSlmi— 1 


ones  and  the  interaction  information* 


T(u,v*  y)  • T(ujy)  + T(vnr)  ♦ A(uvy) 

+y\“9,r  * ' * «vuvy;t 

We  nay  write  this  tbree-diasnsional  information  transmissi.cn  in  another  way 
which  pajBllels  the  familiar  equation  H(y)  » Hx(y)  * T(xjy),  namely  3 
H(y)  - + T(u,vj  y) 

“ H (y)  + T(ujy)  ♦ T(vjy)  + A(t xwy). 


The  term  H (y)  Is  the  residual  or  unexplained  variability  in  the  response 


y after  the  information  about  y glren  by  u and  by  t and  ths  interaction 


information  of  the  three  variable?  has  been  removed* 
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One  unexpected  result  of  HeOlll's  analysis  Is  tbe  possibility  that 
tlie  interaction  term  may  be  scgativa.  "In  other  words,  a knowledge  or  the 
input  [v]  may  decrease  the  maou&t  of  tnfttnstios  that  fy]  has  about  [u]  — 
communication  from  [u]  to  [y]  would  actually  be  better  if  no  data  about 
[▼]  were  collected  at  alll"  [p.  1)31,72] 

One  of  the  most  important  and  desirable  properties  of  tbo  informs- 
tic®  slat  is  tic  - entropy  - is  the  addltiTe  character,  which  was  apparent  in  the 
two-dinensicml  case  and  which  is  even  more  forcibly  Illustrated  in  the  throe" 
‘UzoenalcTBl  tliecay.  Each  of  the  contributions,  that  from  u,  from  v,  from  the 
interaction  - and  fr«aa  unexplained  variability  (while  not  independent)  is 
simply  added  to  obtain  the  information  in  the  rcapwase  pattern*  Thus,  the 
information  analysis  of  a stimulus -response  situation  is  somewhat  analogous 
to  that  of  an  analysis  of  variance,  and  McGill  [61]  has  examined  this  relation 
in  some  detail.  But  as  he  pointed  cut  elsewhere*  ”...  information  trensmiasim 
is  made  to  order  for  contingency  table? . Ifearures  of  transmitted  information 
are  soro  when  variables  are  independent  in  the  ecatingassy-^sense  ( •*©  CppvdQU 
to  the  restriction  to  linear  independence  in  analysis  of  variance).  In  aiidition, 
the  analysis  is  designed  for  frequency  data  in  discrete  categories,  while 
methods  based  on  amiLysia  of  vardan™  «*«  nots"  [pp*  9-10*  62]  "ft  would 
seem  that  infoicaticn  theory  effectively  corresponds  to  a nonparoiaetrie 
analysis  of  variance Ip.  Isll,  72] 

There  is,nat-urally,  no  reesm  wiy  the  above  analysis  cannot  be 
extended  to  m are  dimensions  than  three,  and  McGill  [52]  has  carried  this  out 
in  some  detail.  There  seems  little  reason  to  reproduce  that  hero. 


T 


In  tLe  next  section  vs  shall  discuss  the  testing  of  independence 
hypotheses  in  both  the  multivariate  and  bivariate  c*??So 

5o3  Statistical  Tests  and  Estimations  of  Entropy 

In  addition  to  the  construction  of  node  Is,  behavioral  scientists, 
unlike  most  physical  scientists,  mist  confront  the  difficult  irtatistictl 
problem  of  testing  and  using  his  model  when  the  only  data.  arrailabl?  are  from 
small  samples.  His  use  of  inforssct-ioii  thsccy  is  no  exception  to  this  rule, 
eo  we  turn  now  to  that  problem. 

Let  us  suppose  lilutt  a distribution  p(i)  governs  tne  selections  of 
the  k =i-OariiatxvtJS  l,2,...,k,  and  let  us  suppoao  that  a saujue  ox  n independent 
observations  of  selections  yields  n(i)  eases  of  altewwtiv*  1-  th*  true 
entropy  is,  of  course. 


£ p(i)  log*  p(i), 
i « 1 


lr 

while  H*  m — Z 


la  the  estimator  of  the  entropy 


i - 1 


obtained  by  replacing  each  p(i)  by  its  maximum  li]«tlihood  estimator 

n 

Miller  and  Madow  f 733  have  shown  that  if  the  p(i)  are  sot  all  equal, 

\/n  (H*4I' ) has  a normal  limiting  distribution  with  mean  0 and  variance 


ac  - 2 p(i)  [log*  p(i)  ♦ H]  * 

i - 1 

If,  however,  p(i)  » l/k  tap  evury  i,  than  2n  rtr_w» 'i  imo  & chi-square 

lot^®  

10  In  this  applied  much  calculation  is  necessary.  1’ewisan  [7U]  has  described  a 
specialised  computer  to  assist-  in  this.  Hors  interesting  is^the  table  of  p log.  p 
presented  by  Uewman  and  the  more  extensive  tables  of  Delanalqr  and  Dolan«Vrf  [121? 


I 


Halting  distribution  with  k-1  degrees  of  fTeedau 

They  point  out  that  if  small  samples  are  used  to  estimate  the 
entropy  there  is  a bias  which  can  be  corrected  frw  by  the  following  theorem* 

H - Hi.  .10*,.  [g  - Aj  * °(^5, 

where  EH»  Is  the  expected  value  of  H*  and  0(  -dj  ) denotes  terns  of  the  order 

of  X/n^  or  smaller.  They  also  establish  a slmiler  expression  f-ar  the  ‘variance 
of  H',  but  as  it  is  fairly  complex  we  shall  not  reproduce  it  bare. 

For  tfes  ease  of  equally  likely  alternatives,  Regers  and  Orem  [85  j 
hsvs  dsrclspsd  as  exact  expression  far  the  expected  value  at  H,  namaly. 


m-  * log^n  m 
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i - 2 
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The  Miller  and  iitukm  approximation  in  the  sans  case  reduces  to 


EH*  - log  k 


(Icgg©)  <k2  + 6u(k  - 1)  - 1) 


which*  of  course,  is  mash  slnpler.  Rogers  and  Qreen  point  out  that  for  n > kt 
the  tero  give  newly  the  sane  results,  hut  that  far  n < k,  "...  the  Miller-; UcIgw 
team la  ...  becomes  incroaaingljr  less  accurate)  and  [their  formula]  becomo.3  mare 
easily  computable."  [p.2,  85]  They  else  present  a similar  expression  for  the 
variance  which  v»  shall  not  reproduce  here.  In  another  paper  [86]  they  present 
tables  of  the  me«u  and  variance  in  the  squally  likely  ease  for  wioiw  values 


of  n and  k0 
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I H. ller  [?Cj  has  also  treated  the  problem  of  contingency  tables 
having  r stiraulvs  alternatives  and  a response  alternatives  0 We  let  the  three 
probability  c&stributions  be  p(i ),  p(j),  and  ?(!,;]),  and  the  observed  sample 
freqnsncieft  n(i),  n(j),  and  n(i.l)  from  a sanple  of  sise  nD  The  transmitted 
information.  T-  in  of  aoursa  given  ty 


T ° - 2 ^ p(i)  log2  p(i)  - 2 p(j)  logg  p(j)  + p(i,j)loggp(3 


and  let  T*  be  the  estimator  wiiieh  is  obtained  by  replacing  each  pvi)  by  its 
tascdjsua  livelihood  estimator  sill . If 
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it  is  known  from  Wilks'  [10]  likelihood-ratio  test  of  independence  that  -21og. 

hss  the  cb.i-squa.ee  distribution  with  (r-i)vs-i)  decrees  of  freedom.  It  is 
not  difficult  to  show 


n(logg*)T' 


hence  n(loggVi}T?  ha*  a chi-squara  distribution  with  (r=-l)(a-l)  degrees  of 
freedom  when  the  null  hypothesis  T * 0,  1 -e . , when  the  stimuli  and  the  responses 


are  independent,  is  true,. 
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[ y is  independoni  oi  (u,v) 
if  J y is  independent:  of  r 


1 /t(n,T*y)  * 0 

J J 

y them  ^ x(rjy)  ■ o 
J ! T..(rjy ) « 0 

/ S'* 


y la  independent  of  ▼ when  u Is  held  constant 7 


The  last  two  conditions  each  iajpXy 

* TCujy), 

or.  in  wards,  ▼ is  tot  involved  when  either  of  the  two  conditions  holds  v* 

the  transmission  betseen.  u snd  jz 

Theyfs  Lurrzs  of  oocs“es>  analogous  statejocnt-?  for  the  Hyrtbols'  u,  v. 


In  the  asms  p #.per,  Hiller  showed  that 
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1 * ' nlog2e  * 

and  so  it  is  possible  to  corrert  fo a*  snail  sample  biaa^  He  suggests  that 
n should  be  at  least  5rs  in  ords?  to  mke  estimates  of  the  information 
transmitted  0 He  also  showed  that 


w(T,)  - + sfe 


WV1  ' H^Toi SLo^e  * 

Fortunately,  siuce  we  do  not  know  T,  wa  do  know  that  n is  generally  much 
larger  than  T,  so  the  last  term  can  be  neglected  and  the  variance  :la  given 
approximately  by  the  first  tera. 

BcGill  [62]  has  extended  sons  of  the  above  results  to  the  multi» 
variate  case.  First,  he  observes  that! 


and  y 


To  teat  the  frypotbecia  that  ary  of  the  T»a  are  sere,  HaQUl  xmta 


ITlllar'e  result  relating 
obtains 


with  the  lUtelihood-sst&s  teat.  Cbe 


T(nfiTjy)  - 0* 
?{ujy)  - 0 

than 

T(rjy)  - 0 


Ty(ujr)  - 0 j 


nloggeT * (u,t  jy )' 

nlofy^Kuiy) 

nl^eT^rfy) 

nlog^eT' (urw) J 


fcss  a^cxt- 
smte'ly  & 
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distribution 


(0?-l)(X-lA 
(TT“1)(Y-1)  Oegrws 

<**x**>  ^Lda* 
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where  O',  V,  and  T are  vna  number  of  points  in  the  ranges  o£  u,  »,  sad  y 


pSCtivSly,  and  a is  the  *■-*  of  the  eaepls. 


ne  sham  tbit 


l^pothc^iJ 


n(i,j^a)  - p(i)p(j)p(in) 

is  tree*  then  T(u*y>,  T(*rjy)  and  *y(ujr)  are  asynytoticaiay  iwtesMndect} 
thus,  sa  an  approximation,  tk8  corresponding  primed  TJs  can  be  tested  siaul- 
tawwLjiy  for  significance  unde?  the  null  hypothesis* 

rfeSlll  present  an  interesting  example  which  shG*s  wary  graphically 
that  ”o . . ve  cannot  decide  whther  an  amount  of  transmitted  information  la 
big  or  small  without  'lowing  ita  wgrboa  of  freedom*  * [p-  16,  62] 
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Part  H,  ^ppiieaiions  to  Behavioral  Problemo 


1>  Introduction 

The  applications  of  infestation  theory,  ineludir.g  it,*  Indirect 
influences  in  applied  area a,  are  net.  t*ny  either  tc  evaluate  or  to  suzaarixcio 
There  can  be  little  doubt  that,  in  addition  to  the  direct  applications  which 
we  car  cite,  it  has  bad  a very  broad  irpaet  on  the  thinking  of  many  behavioral 
scientists  o It  has  affected  both  the  approach  to  the  analysis  of  certain 
types  of  data  and  the  choice  of  probleos  to  be  considered  asperiosstally. 

Such  influences  cannot  be  succinctly  described  or  tabulated*  and  we  shall  not 
attsopt  tc  do  so  hc«v*»  A more  tangible  effect  of  the  theory  in  the  behavioral 
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But  since  these  articles  have  appeared  sporadically  in  most  of  the  behavioral 

areas*  ce*  can  hardly  hepe  for  a,  pattern  of  applications  This  fact* 

eotqjlsd  with  the  inability  of  one  person  to  know  these  various  literatures, 

forces  vjs  to  consider  the  two  behavioral  area*  where  the  publications  have 

been  especially  numerous  and  whore  the  pattern  is  clearer*  psychophysics  and 
2 

psychology. 

The  realisation  that  inroraation  theory  could  play  aa  important 
role  in  psychology  c«ae  iu  tue  late  f sorties,  only  a few  yeirs  after  the 


lo  Biology  in  to  sons  degree  an  exception.  Ihsch  of  the  application  to  biological 
quejetiesjir  has  a termed  from  the  interest  of  Quastlnr,  ;?ho  has  gathered  together 
much  of  that  work  in  one  volume  [8]. 

20  hush  cf  the  satsrlsl  we  1 discuss  here  has  oesu  stetsaarized  by  Ililler  {72] 
in  sonewha.t  less  detail  than  we  shall  present  hare. 
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publication  of  Shannon's  now  classic  paper.  The  realisation  was  symbolized 
and  to  a large  degree  accelerated  b y a paper  which  Hiller  and  Fei.dk  [6p] 
published  in  19uS.  They  obserre  that  [a]  psychologist-.?  experiments 
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rate  s sequence  of  symbol**  right  ®nd  -roug,  conditioned  and 


unconditioned,  left  and  right,  slow  and  fast,  adient  and  »bient,  etc*”  [p.  31 h] 
That  is  to  say,  very  many  experiments  sire  ef  the  *ti^iiua-*re8j3onse  type,  where 
the  stimuli  fora  one  eequense  and  the  responses  another*  Generally,  the 
procedures  to  analyze  such  d*t*  have  ignored  the  sequential  relations  among 
tin*  responses  (usually,  though  sot  slveysf  sequential  effects  la  the  otisrull 
tisrs  bean  s^j-riasstslly  elimiaatzd  bv  wmdasaimina  procedures ),  but  ignoring 
ths  sequential  information,  they  pointed  out,  is  equivalent  tc  sssundsg  tbs 
independence  of  snccesaive  responses*  It  was  sot  implied  that  psychologists 
felt  that  this  was  a reasonable  assumption,  but  only  that  the  standard 
statistical  techniques  were  not  suited  to  such  an  analysis,  Ac  exception  to 
this,  of  course,  has  been  the  use  of  contingency  tables  to  study  tcepflarally 
ordered  pairs  of  responses  (digrams)  and  tl»  use  of  contingency  measures  b' 
characterize  the  degree  of  association  between  the  arguments  of  the  table* 
IflLller  and  Frick  then  outlined  certain  aspects  of  JjRf«pjaatd.on  theory  and 
proposed  that  toe  information  measure  be  rap?  jywo  m auch  situations*  As 
Frick  and  KLawner  ]x>int  out  irs  • later  paper,  ”1*16  [information]  measure 
say  be  applied  without  logical  difficulty  to  any  situation  in  which  one  is 
willing  to  identify  the  j^eabars  of  the  £.*t inulus  and  response  classes  and  make 
Boae  ctatweuto  about  thsir  probability  Whether  or  not  the 
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measure  is  useful  In  the  analysis  of  human  behavior  remains  to  be  prcvrato 
Early  results  from  its  application  are,  however,  encouraging. [p.  1$,  19]. 

There  are  difficulties,  however,  for  as  Hiller  and  Firiol;  pointed 
out,  there  are  two  serious  limitations  on  the  applicability  of  information 
theosy:: 

1.  Sequential  responses  which  are  generated  while  learning  is 
occurring  do  not  form  a suitable  sample  from  which  to  estimate  the  probe- 
hilitiss  which  are  needed,  for  the  assumptions  of  learning  and  of  a stationary 
response  time  series  aye  incompatible., 

2.  The  difficulty  of  obtaining  adequate  samples  to  estimate 
probabilities  Increases  sharply  with  an  increase  in  the  length  of  dependencies 
in  the  response  eeqiienee*  in  fact,  beyond  three  step  dependencies  it  is 
completely  out  of  hand. 

Helated  to  the  last  point  arc  the  computational  difficulties  which 
arise  with  large  amounts  of  sequential  data.  Basically,  however,  this  problem 
is  leas  serious  than  the  sampling  one,  since  computation  machines  ideally 
suited  to  r-jpetitioua  calculations  are  available-  In  addition,  special 
equipment-  such  as  that  described  by  Newman  [7h] > can  be  constructed  to 
ca-riy  cut  iu.fcn3£ticn-type  analysis. 

Hiller  and  + * |f*wpOwvd  Wiat  the  quantity  which  is  ceil  ad  redundancy 
ia  communication  problatjs  (section  Xo3«6)  be  the  indoor  of  behavioral 

gtereotfpy  in  behavioral  applications . It  will  be  recalled  that  this  is  defined 
as 

H 

x “ max  H 
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It  is  a luiutltj/  whiwi  is  1 when  the  behavior  is  caqplstely  stag-eotypie  and 
0 when  each  ef  the  sweral  alternative*!  arises  with  equal  probability.  A 
value  of  k for  the  iad ex  neaas  that,  on  the  average,  k per  oaot  of  the 
responses  are  completely  determined  and  the  reminder,  1-k.  ere  MOdLmally 


uncertain. 


Tew  bj  year,  following  the  publiastioo  of  this  paper,  there  has 
been  an  lner**M  in  the  ember  of  papers  in  psychological  Journal*!  wpinying 
information  theory,  with  al=sct  s fined  is  1553.  It  is  not  plausible  to 
suppose  that  this  Irena  will  decrease  rapidly,  if  at  all,  in  the  next  year 


ear  two,  and  so  we  wn  iw  i.nro 


ww  any 


rs  will  be  emt 


of?  date  before  it  can  b*  widely  read.  Tet  already  there  seers  to  be 
some  pattern  to  the  pr "lic-tionn , and  so  a sumaary  nay  serve  sons  function,  as 
long  as  it  is  kept  in  mind  that  it  Is  a cross  section  at  ii^oopletsd  tr-siU 
FVcm  our  knowledge  of  ifce  thesy,  it  seems  reasonable  to  class  the 
applications  in  three  categories!  1)  Those  which  enploy  inf  emotion  bbsery 
to  deal  with  eerJentlai  data,  as  proposed  by  IHUfr  and  Erick.  Sections  II  .2 
acd  H.8  are  Illustrative  of  tills  approach.  2)  Those  which  enploy  the 
fennalism  of  noisy  ccrommic&tion  ( discussed  in  section  1.5)  to  cope  with 
problems  where  stimulus  snd  response  are  not  perfectly  correlated  * e*g.,  where 
there  are  errors  of  some  tjpe0  Sections  TZ»;>  and  tT«7  srs  typical.  3)  Those 
which  esploy  the  central  theorems  of  information  theory  non  earning  rate  of 
transmission  and  capacity.  Section  H.k,  sad  to  floras  'sxtertt  st  ation  II. 5, 
exsnpUfles  this  approach. 


i 
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In  anticipation  of  civ?  surrey,  three  features  of  the  trend  of 
application  seem  worthy  of  note.  1)  As  taller  and  FVick  suggested  would  be 
the  case,  and  as  ve  msntion^d  In  section  1.5,1,  fee  of  the  applications  are 
to  problems  usually  elssrifiad  as  ccsanmicaticn  prchlass.  2}  The  applications 
do  cot  generally  employ  to*  fusdaaisntal  theorem  relating  channel  capacity 
and  the  statistical  structure  of  tho  boc-cd.  For  «®enpie,  ws  know  of  eelr 
two  limited  attempts  to  characterise  the  capacity  of  a behavioral  system 
other  than;  by  observations  of  the  actual  rate  of  transmission.  3)  The  theory 
has  net  generated  new  problems  to  be  studied  in  psychology,  but  rather  It 
has  caused  researcher*  to  re-examines  old  parobleias  from  a uev  point  sf  rica. 

In  sene  oases  (sea  section  H.5)  it  has  permitted  several  epparently  disparate 
effects  to  be  included  in  a single  theoretical  fTsnwwork. 

The  ffect  that  old  problems  are  being  considered  again  does  not; 
tsu^x-tiauetely,  mean  that  iw  data  are  not  nrieded.  A published  experiment 
rarely  fulfils  exactly  the  conditions  another  worker  would  like,  and,  aero 
important,  the  isolation  of  sequential  dependencies  requires  * new  anal^uis 
of  thw  rs*  data,  and  it  is  rety  rare  indeed  to  find  extensive  publications 
ox  rar  data. 


2,  %bet  Datropy  of  Printed  English 

A problem  which  has  intrigued  a number  of  authors,  including  Shannon, 
i«  the  estimation  of  the  entropy  of  printed  English  (or  asy  other  language, 
for  that  natter),  i.e.»  the  estimation  of  the  average  nunfrer  of  bits  pee  letter 
In  a written  passage.  Put  another  way,  the  problem  is  to  characterize  the 
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svsrsgs  sequential  dependencies  in  t he  written  language.  If  we  assume  - 
os  may  be  spprurlnately  true  - that  the  English  in  one  book  or  article  id 
the  typical  output  of  a stationary  source*  the  author*  then  in  principle 
all  we  need  do  is  oaloulate  for  all  letters  J and  for  a 

N-tupies  of  letters  and  blanks  which  might  precede  j.  From  this  we  then 
cocpute 
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where  b.  denotes  & typical  block  of  11-1  successive  letters  preceding  j. 

’JKy  those  known,  then  ve  could  estimate  the  entropy  of  the  sample  to 
say  desired  accuracy  using  the  fact  thst 

H » lim  F_. 

N-o « * 

15e  difficulty  becomes  apparent  when  it  is  realiaed  that  from  a 27  letter 
alphabet  there  are  27^  possible  H-grams,  Of  course*  mery  of  these  are  ruled 
out  as  inpossible  in  EtagLIeh,  but  even  vert  ve  to  assume  that*  say*  caly  coe 
par  cent  were  possible*  there  would  still  bo  1*968  eases  to  ba  examined  with 
N ■ 13,  and  53*1*4  fer  H » 1*. 

t?onethel«£!S*  Pg  can  be  ecGgnrtcd  for  vary  small  values  of  H*  and 
Shannon  [91]  reports  that 

F,  “ feits/Lstteor 
F„  o 3,56  bits  /Latter 
® 3.3  bits/lstter 

Kis  ealculB.ticns  urs  based  on  the  letter,  digram*  and  trigmn  freqasnsi&s 


L 


which  how  been  prepared  far  coding  wcric  (Pratt  [76]  j0  wort  only  la  it 
practically  impossible  to  eaiTy  this  approach  each  further,  but  Shannon 
suggests  that  and  all  higher  F*s,  cay  be  liable  to  sone  merer  since 
mn^jr  of  the  N-gruns  in  the  sample  will  bridge  *wea  two  words.  It  is 
clear  that  other  approximate  techniques  are  necessary. 

Three  proposals  hairs  i?ecn  rads.  The  first  employe  in  one  way 
or  another,  the  built-da  knowledge  of  Ihgllah  statistics  in  Itoglish^-  speaking 
people.  The  second  attempts,  an  assumption  to  by-paas  the  saxplixsg 
difficulties  cf  the  direst  pr  . osdura  discussed  aberc.  The  last  utilises 
*-he  known  eppiricai.  diatribe : ions  of  English  wards,  though  ignoring  the 
statistical  dependencies  among  words,  to  determine  an  upper  bound  on  the 
entropy 0 Ue  sliall  discuss  the  proposals  in  this  order. 

2.1  Shannent3  Upper  and  Iswsr  Bounds 

la  his  original  report.  Shannon  [pp0  25-26,  88]  states  that 
"The  redundancy  of  ordinary  English,  not  considering  statistical  atnicture 
over  greater  distances  than  about  eight  letters  is  roughly  £0  per  cent." 

(The  definition  of  redundancy  was  given  in  section  1.3*6;.)  In  a later  paper 
[91]  he  cites  his  original  estimate  as  about  2»3  bits /letter  0 He  arrived 
at  this  figure  using  two  -sechniques » First,,  he  developed  approximations  to 
English  using  the  published  frequencies,  digram,  and  trigram  frequencies  of 
letters  and  the  frequencies  and  digram  frequencies  cf  words  to  generate  approxi- 
mations to  English.  The  reduridonoles  in  each  case  were  calculated;  in  the 
last  two  car ^3  «voma  extrapolation  was  required,  sines  the  tables  wore  not  complete. 
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Second,  ha  selected  passages  of  English  at  randcs,  and  using  a table  of 
random  numbers  be  deleted  (but  with  an  : Indication  that  a deletion  luul 
occurred)  t certain  percentage  of  tto  letter?! 0 His  subjects  then  attempted 
to  reconstruct  the  original  passage,  and  ha  found  that  the  letters  could  be 
restored  with  hi,~h  accuracy  when  50  per  cent  were  deleted,  flroo  which  he 
concluded  that  the  redundancy  must  be  at  least  *50  per  cento 

1c  this  second  paper  [91],  Shannon  carrion  his  estimation  procedures 
further  by  developing  both,  "T7-er  and  lower  sounds  for1  the  entropy,  and  his 
data  indicate  that  the  redundancy  ray  be  nearer  75  par  cost  than  SO  per  cent. 
He  selected  100  samples  of  English  text,  each  consisting  of  15  letters.  A 
subject  was  required  to  guess  at  the  first  letter  of  a passage  until  he 


wtol  Sou  it  VWA  i vCtlj  • Sowing  it»  he  guessed  at  the  scccnd  until  it.  was 

„th 


obtained.  In  general,  knowing  II-.!  letters  he  guessed  at  the  H until  ha 

was  correct.  The  data  nay  be  presented  as  a table  haring  15  columns  and 

27  rows  (26  letters  and  a blank).  The  entry  ia  column  N end  row  S in  the  number 

th 

of  times  subjects  guessed  the  correct  letter  on  tbo  S guess  gives  tu*t  unagr 
know  the  11—1  preceding  letters.  A 1 portion  of  the  table  Is  raprodneeds 
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The  column  narked  100  was  obtained  by  presenting  the  subject  with  99  letters 
from  a 100  word  paerngc*  The  dr.ts  fcr  columns  1 and  2 were  prepared  from 
published  word  and  digram  frequencies  which  are  based  on  far  larger  samples u 
To  use  these  iat-,,  Shannon  introduced  the  notion  of  an  ideal, 
predictor  who,  knowing  p(b.  .)•  the  probability  of  all  U-grams,  would 

select  letters  j in  order  of  decreasing  probability  for  the  given  b^o  Thus 
each  letter  of  a message  can  be  replaced  by  a number  between  1 and  f;7  which 
tells  how  many  guesses  will  be  needed  before  the  correct  letter  is  obtained  b 
For  an  ideal  predictor  this  sequence  of  numbers  will  contain  the  sane  informal 
tion  as  the  message,  since  one  can  be  constructed  from  the  other,  but  it 
has  the  added  feature  that  there  will  be  limited  statistical  dependencies 
among  the  numbers,  since  the  difficulty  of  ooe  will  not  generally  determine 


that  of  the  next „ Hence, 


icing  the  entropy  of  the  number  sequence  is 


not  difficult,  and  it  can  be  ’used  to  estimate  the  entropy  of  the  language* 


The  frequency  of  the  number  k in  the  reduced  text  will,  of  course. 


be  given  by 
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where  the  sum  is  taken  over  all  (N-l) -grams  b^  and  over  those  j !o  *nieh  that 
it  results  in  the  k^5  largest  probability  for  the  given 

th 

Shannon  then  shows  that  the  W order  entropy,  *«>  is  bounded  by 


s f: 
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Using  the  datr-.  descrii>ed  above,  and  smoothing  them.  Shannon  calculated  upper 
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and  lower  bounds  for  N “ '.L,2*  . . t pl5.  100 » Sock  of  the  ralues  are* 
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1 

2 

5 

10 

15 

10C 

nppKtr  bound 

Jio03 

3.ii2 

2.7 

2.1 

2.1 

lo3 

lower  bound 

3ol? 

2.50 

1.7 

1.0 

1.2 

0.6 

•Tppfijr  and  lower  Bounds  on  Fn 

’When  both  sets  of  points  are.  plotted  for  E “ 1,2, there  still  remains 
some  sampling  error,  but  smooth  curve?  con  be  faired  through  the  points 
rsssossbly  wli.. 

It  should  be  noted  that  there  is  a considerable  drop  in  both 
bounds  between  N * 15  (at  which  point  the  currea  are  nearly  flat)  and 
N • 100.  iJhather  or  not-  this  in  meaningful  in  difficult-  to  say,  but,  as 
we  shall  see,  none  of  the  other  estimates  suggests  that  the  entropy  is  as 
lew  as  1.3  bits/lstterj  howwver,  it  mast  be  kept  in  mind  that  all  of  these 
will  bs  ™j» r bounds,  and  how  ranch  too  large  they  msy  be  is  not  known  - 


2,2  the  Coefficient  of  Constraint 

Eewann  and  Gerarfcaan  [75]  approached  the  problem  in  another  way  which, 
does  not  depend  on  -ouilt-iii”  knowledge  ox  Haslish  srtetistios,  but  which 
does  enploy  an  as  ysi  unproved  assumption.  They  define 


- Z pdJloggpU) 


«»d  H(1,K)  - -Z  Z p(i.j)  log,  pCliJ); 

i 3 

v»j ere  i and  J are  letters  in  a passage  which  are  separated  by  H-l  others* 
That-  is-  H(l,N)  measures  the  average  statistical  dependence  if*  “ chM.ee  j 
on  the  choice  l which  was  made  U letters  earlier . As  K becooaa  large  it  is 
slsar  that  this  dependence  decreases*  A measure  of  its  magnitude  is 

Sj  (N)  = H(ljN)  - H(l). 

They  then  define  a quanidiy 


.'..i  _ « ■ (K ) 

i )\MJ  - X - JL 


which  is  called  the  coefficient  af  constraint.  - It  is  a quantity  which  is  1 • 
when  the  IIth  selection  is  uniquely  determined  by  the  first,  and  0 when  the 
IIth  is  independenro  of  the  first*  Since  only  pairs  of  letters  are  InraLvtc 
in  these  quantities;  it  is  coaparatively  easy  to  det-arndne  them  for  a given 
sssple  ox  language. 

Using  a 10,000  word  sample  from  the  Bible,  they  obtained  the- 
following  data! 


j 2 

3 

u 

h 

5 

6 

10 

D(N)  j .223 

*1.03 

.061* 

.039 

.027 

.012 

and  a letter  frequency  entropy  of  1**08,  «hich  we  observe  is  slightly  different 
front  the  1*.U*  obtained  by  Shannon*  A plot  of  thase  data  on  leg-log  paper  is 
appropdiaately  linear  with  a slope  of  -2*0,  or*  ia  other  words,  D(W)  » , 

approximately  * 


9Brn*ew>av..«_ 


Ci 


The  problem  sew  is  whether  we  can  estissfce  from  data  as 
D(N)i  The  Bxunbect  is  'yes,1  provided  It  is  true  that; 

% £ & “ D<H)J  fn-1* 

This  relation  1 • osrfcalnly  Imti  when  B m 2 - indeed*  the  equality  holds  then 
and  it  is  true  for  'my  3 such  that  the  eyabols  are  independent,  for  then 
D(N)  * 0 and  They  point  out  * however,  that  no  proof  of  the 

assumption  has  been  found,  and  they  add  without  further  elaboration  the 
cryptic  concent  "„«*  iXid  there  are  limbing  cases  in  which  it  :Ls  proved  not 
to  apply**  [p*  ISO,  75]  In  cay  case,  if  it  is  assumed,  case  hits 
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i - 2 


i - 2: 


i - 2 


I 

1 


. h(i)  12.112 

2 N 

mere  we  fcsv®  iatredeooed  the  satirically  grounded  assuaptdon  that 
B(i)  - l/lZo  In  the  Unit 

H - Ida  l"w  - H(l)/2, 

vhiwit  jji.Tco  an  i^jper  bound,  if  the  h«  assunpticrjs  are  correct,  of  2n0l* 
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i m 
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bite/lettar.  It.  aiditian,  far  N « 1,2, ...,i5  they  coejarte  H(l) 

and  thar  eaupwt  thee*  points  with  those  obtained  by  Shannon  as  an  upper  bound. 
Ibis  curve  z&*sb  to  fit  the  points  aa  wall  as  ths  faired  inorvn  of  Shannon. 


2 o 3 Distribution  of  Words  and  Latter  Srtrocy 
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The  third,  and  last,  Eajsr  s^prosefc  to  setting  bounce  on  tin  latter 
entropy  nets  on  a computation  of  word  entropies  which  is  based  oc  known 
frequencies  of  word  u*«  in  the  language.  This  entropy,.  when  divided  by  the 
snsji  ward  Ion 5th.  affords  an  outlast*  of  the  letter  entregy  which  is  only 
«r  tppar  bound,  since  the  technique,  based  as  it  la  only  on  word  froquami**, 
ignores  fioapletaly  tJae  redundancy  due  to  inter-word  irfliienb~  = 

Long  before  information  theory,  psople  had  dstsndned  the  frequency 
of  usage  of  varices  words,  and  it  was  Zipf  1102  j who  observed  that  if  we  rank 
word a l,2,...,r,«..  in  order  of  decreasing  fraquwusy,  than  the  frequency 
of  v.S3  of  a word  is  simply  proportional  to  the  Inverse  of  its  rank.  ihat  is, 
the  probability'  P?  that  a randomly  selected  word  is  of  rank  r is  given, 
cpjsroodmtaly,  by 

?r  “ k/V» 

wbero  k is  « grcpoFtloaslity  factor  indgpssdesrfc  of  r.  There  is  a certain 
achigeity  as  to  just  how  aaqr  ranks  there  are  and  oartaixuy  if  m consider  all 
possible  aaglish  va?ds  the  approad^rte  law  fails  for  vary  high  ranks.  Ths 
value  of:  k Is  chose©  so  the  * known  as  SSj !ff»a  in,  holds  far  the  loi-=r 
ranks,  and  the  ziss  H of  t he  vocabulary  is  given  by  the  condition 
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Bewnan  sad  Ovstn  [?*>],  IUller  [67].  and  Shannon  [yxj  bare  *11  serried  out 
this  c&qput&tion,  bat  «a  Bowman  end  umtaei  point  oat,  there  are  certain 
discrepancies  in  the  results = Shannon  obtains  B ■ 8,727,  idiUe  Iflllsr, 
presumably  using  a definite  integral  to  approodaatc  the  aeries,  gets  22,000, 
ami  TTnwnan  and  Garstnan  obtain  12,370  by  trkiag  Into  account  the  discontinuity 
of  the  first  100  ranks  and  approximating  the  rest  of  the  series  by  an  integrals 
Using  than  distributive,  it  is  then  possible  to  calculate  the 
entropy  of  the  independent-  ward  selections  according  to  the  distribution. 


"•  * l f 

Shannon  obtains  11.82",  Miller  10.6,  and  Beaaan  and  aar-itssc.  9»7  bitaArords 
These  glee  estimates  of  2.62,  2*36,  end  2.36  bite/ietter  if  we  take  li.5 
letters  to  be  the  arrarage  word  length.  There  Appears  to  be  a further  dis- 
agreement, as  was  pointed  out  by  Bataan  and  Orataan  [p.  122i»  7 5}*  Cousluaring 
the  different  values  of  N obtained,  both  the  Shannon  snd  the  Uewsan  and  Gar e tram 
results  should  be  on  th<i  sans  side  or  the  Miller  roaultj  they  are  not. 

Another  approach  to  the  problem  from  the  point  of  view  of  words 
is  due  to  Bell  f2j.  He  supposes  tiiat  the  space  between  words  is  sert  infallibly 
and  than  tie  observes  that  the  length  of  a wanS  carries  setae  information*  -As 


the  simplest  cmsple*  consider  the  feet  that  there  are  oaly  two  Hards  of 
(ms  latter  in  none!  uses  the  personal  pronoun  'I'  and  the  Indefinite 
article  'a.1  Basse  only  two  ert  of  the  26  a Ingle-latter  'wards’  which  axe 
aathaaatlolUly  available  free  the  alphabet  are  admitted  to  the  English 
language,  and  It  follows  that  when  a word  of  one  letter  is  received  in  Ehglinh 
the  choice  is  only  1 cot  of  2 Instead  of  1 out  of  26.  An  alternative 
expression  of  this  is  that  the  'internal  information1  .implicit  in  the  fact 
that  the  X-lsttcr  word  is  it.  the  English  language  equivalent  to  a selection 
of  1 out  of  13  alliernati-ret  j aau  the  cannuaic»tiua  of  a selection  of  1 out 
of  13  would  he  regarded  as  a corssunlcntlon  of  3*7  :bita  ‘ of  Information 


C-log^iS  » 3*7)*  so  that-  svsragn  Internal  information  ar  1-xetter  wards  is 
the  Ihgliah  language  maa r be  stated  as  3*7  bits  per  letter P“  tp*  38U>  2] 

For  longer  words  tn»h  a detailed  analysis  is  Ssposadble,  so  he  ued#  statistical 
sample*  fro*  tha  dlutionary*  Fbora  this  he  calculated  the  internal  irf oraation 
in  bits /letter  and  he  obtained! 


Srraber  ctf  Letters 


1 

C, 

* 

u 

5 

6 

7 

8 

Internal 
Inf  oraation 

3.7 

2,2 

1.53 

1.93 

2.36 

a £.£ 

2.98 

3.21 

TVJLs  curve  was  snoothly  axfcrapolaiod  far  words  longer  than  3 la t tears  c Iking 
Dewey's  raord  list  i'll]  to  obtain  relative  fr»Kjusncies  of  words  of  varloia 
lengths  he  calculatod  the  weighted  averegn  of  the  internal  infeiffiatlen  sj i 


i he  obtained  2.1  bits/Letter. 
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2.U  |fo»  Rcle  of  Redundancy 

Whatever  the  era  Tact  valve  of  the  letter  entropy  is,  it  Is  clear 
that  It  is  not  much  over  2 bits/letter  and  not  natch  less  than  1>  snd  so  the 
redundancy  is  s<x’«yhsre<  between  £'0  snd  75  per  cent.  In  other  words,  we 
could  tr-s^rLt  the  sane  information  as  we  do  ;rf.thar  by  using  a considerably 
smaller  alphabet  and  keepiug  Lm  length  of  bosks  and  articles  ihs  same,  cr 
by  keeping  the  sane  number  of  s ymiuis  in  tbs  alphabet  snd  reducing  sentences 
snd  books  to  from  cue  quarter  to  one  half  their  present  length.  That  our 
language  is  not  fully  efficient  in  this  statistical,  sense  presumably  results 
from  our  seed  to  cosasnnicate  rapidly  and  accurately  under  adverse  conditions, 
i.wo,  where  there  is  noise*  in  the  presence  of  other  voices,  in  the  wind,  at 
a**-  etc.  It  is  dear  Area  the  little  example  given  in  section  I.U.3  that 
even  a sxssll  wmt  cf  noise  can  result  in  a.  serious  drop  in  the  information 
transmitted  - in  that  eaas  a one  per  cent  chance  of  error  resulted  in  a 
ten  per  cent  drop  in  the  entropy.  It  thus  appears  reasonable  that  if  a 
lfinguage  is  designed  t-o  copo  with  evm  a slight  amount  of  nalss,  then  the 
redundancy  ssst  be  quite  higfc  indeed  , Of  course,  "ben  the  noise  level  is  so 
high  thst  tu«  natural  redundancy  of  the  language  is  unable  to  comber,  it, 
ether  method*  ere  used,  e«g,,  -w-ords  and  even  whole  sentences  ere  repeated, 
«uu  in  itch  places  as  factories  the  vocabulary  between  tars  people  mey  be 
reduced  to  a few  words  - possibly,  to  'stop*  ;md  ’go, ? 

ia  example  of  a purposeful  increase  in  redundancy  i&  found  ir,  the 
very  formal  iM-urnagu  used  fur  air  traffic  control  at  .~i  airport,  PWck  and 
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Sunfcy  [20]  hiTT*  per— cnted  a suisuaiy  of  their  findings  for  this  language, 
but  without  much  of  the  data,  They  used  the  technique,  introduced  by 
ShmsawDi  [91],  of  haling  subjects  predict  the  next  letter  of  a laessage.  Using 
trained  personnel  as  subjects  they  found  that  the  uncertainty  of  control 
tower  language  is  about  23  per  cent  that  of  random  sequences  of  letters  n ad 
spaces.  And  this,  they  point  out,  is  a serious  crrsrestimatioit,  since  in 
practice  the  operator  almost  always  knows  ths  pilot's  situation  and  thcsrBfaro 
certain  messages  are  saafludad.  To  estimte  these  situational  constraints, 
they  described  hypothetical  situ&ticcs  to  100  Air  Force  pilots  and  asked 
then  to  predict  the  control  tsw  message.  Forndng  equivalence  classes  of 
'meaniTig  ante5  and  taking  into  account  the  imposed  gr azaar  cf  the  language, 
they  found  that  the  nzwierfcaiuty  was  no  sere  ■Shan  20  per  cant  of  vh«t  it  voold 
have  been  had  the  uitits  been  equally  likely  and  randcsnl/  selected.  The  overall 
effect,  thqy  estimate,  is  a redundancy  of  about  96  per  cent.  This  is  not 
an  Implausible  result  ’/hen  one  censidaati  tbs  high  noise  level  in  both  the 
tower  and  the  plane,  and  especially  the  low  margin  of  allwable  error. 

A similar  stated  of  tower-pilot  cossTanie*iiians  at  the  Langley 
Air  F«rc“  birr  has  been  presented  by  Felton,  imts,  and  Qrlor  [18].  As 
in  cits  Frick  and  Sun&y  work,  they  divide  messages  into  infomatioEi  alsasnte  •» 
*-o.  & word  or  a group  of  words  representing  a type  of  InfcrsoLicsv,  tvezh  as 
runtmy  assigoroent,  elapsed  tine,  etc.*  [p.  $1  “ -•‘*«.ded  the  analysis  of 

redundancy  into  three  levels*  first,  they  sinplv  _ _L  :Lrrtc  account  tho 
frequaxieies  of  the  various  information  elements;  second,  they  c^srminsd  tbs 
predictability  within  a mss  sage  j and  third,  they  deternirsed  the  predictability 
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batmen  messages  from  the  observed  conditional  probabilities  between  messages. 

Ai  the  second  level,  thay  determined  the  probability  of  each  massage  and 
determined  the  entropy  of  whnl®  messages.  This  divided  by  the  average  nunfcer 
of  elements  per  message  was  taken  to  be  the  entropy  of  each  element.  A 
justification  of  this  procedure  was  given.  The  data  are  separated  into  massages 
originated  in  the  air  and  at  the  tower,  and  tbs  estimated  redundancy  using 
each  of  the  three  levels  is  presented: 


Level 


V 

2 

3 

Air 

„35 

.72 

.SI 

Tower 

.26 

•7)> 

e?8 

IM.nwtirr.y 

The  anthers  estimate  that  if  contextual  constraints  era  taken  into  account  , 
as  they  v»w«a  in  the  FricJc  and  Suj±y  paper,  then  the  redundancy  would  be 
about  93  per  cent.,  which  cccpares  closely  with  the  95  per  c*ni  mentioned 
aoovtt. 

3o  5iswrifr«tlon  or  Words  in  a La nguag* 

In  the  li*st  section  we  used  the  empirically  grounded  observation 

of  Eipf  that  if  the  words  of  a natural  language  srs  rsaked  from  the  most  to 

th 

the  least  saaaon  then  the  frequency  of  the  r ward,  is  apprexiisately  imersoly 
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proportional  to  r«  ZJpf  faur*t  that  more  lingula  c in  data  could  be  fit  by 
this  more  general  equation 


where  py  is  the  frequency  of  the  r word  and  P am  B are  constants.  B 
being  ia  tbs  n»i  jdiih^w+inod  of  1 for  ail  language^. » "Although  this  relation 
appears  with  regularity  in  linguistic  data,  no  one  has  elai/aed  sore  than  a 
vague  appreciation  ef  its  cause  of  signlficaneec  No  am,  that  is,  until 
Mandelbrot*"  [p0  hl3,  72]  Mandelbrot  has  discussed  his  work  in  several 
plnces,[57,  50,  59,  60],  the  clearest  probably  being  [60],  and  liillar  [72] 
has  gives  s vary  helpful  surauary  of  it* 

Mandelbrot  started  with  the  asmnmrtion  that  the  language  **  like 
all  known  ones  - is  discrete,  i»w„,  that  connuniaution  is  trr  means  of  units 
aailed  wards  which  are  separated  by  a space.  Ho  lurbher  assumed  that  the 
transmitter  In  the  eonauxiication  system  encodes  and  the  receiver  decodes 
word  by  VTTdo  "Although  it  may  seem  trivial,  the  introduction  of  the  space 
between  words  is  the  crux  of  Ilandslbrot's  contrihirti  n and  the  main  feature 
that  leads  him  to  results  different  from  Shannon’s,  la  Sharjnoa’fl  prigjiain, 
the  entire  nsssagt  is  resanfecred  and  then  cod^d  in  the  most  efficient  farm 
for  transmission.  In  Mandelbrot's  problem,  the  message  is  rersjnbered  only 
one  word  at  a time,  so  that  every  time  the  space  occurs  the  traasmdttHr  sake* 
the  most  efficient  coding  hs  cap  cf  that  word  and  then  begins  ansa?  on  the  next 
ward.  Oirriously,  a transmitter  of  ths  kind  Shannon  studied  will  be  more 
efficient,  but  ore  cf  tlis  kind  that  Mandelbrot  is  studying  will  be  mors 
practical,.."  [p.  Ub,  72] 


»76* 

Let  us  assume)  that  the  words  are  ordered  by  decreasing  frequency 
of  occurrence}  denote  them  by  W^,  V^» •p.>Wg»  Let  the  corresponding 
frequencies  of  occurrence  be  p^,  p9»»*«»p^»  Let  us  suppose  that  to  e~ch 
word,  there  will  be  a cost  0^  for  using  it  - we  do  not  specify  chat  we  mean 
by  cost  '?r«ept  that  it  can  be  sumarised  by  a real  uusuer*  It  might  be  the 
rrarbei*  of  bite  required  to  transmit  it,  or  the  delay-  etc0  The  first  problem 
Mandelbrot  attacked,  which  he  called  the  'direct  problem.'  is  to  find  what 
the  costa  Zf  should  be  so  as  to  result  in  the  least  costly  trarissdssicn  of 
messages  assuming  wocrd-by^eiord  coding  and  known  frequencies  pr«.  This  condition 
yields,  as  a first  approaimtion, 

Cr  > tlogjjr] 

where  [x]  denotes  the  next  integer  roli.w~j.itg  x.  A betlwr  »ppi  on  in 

C?  “ [logjjCr  * m)  ♦ 

vhsK:  M,  n,  and  <1  are  constants  independent  of  ra  Observe  that  the  cost 
depends  on  the  ranking,  but  not  on  the  details  of  the  probability  distribu- 
tion. 

Next.  we  turn  to  what  Ifendslbrot  called  the  'ir/vurse  problem,  * 

Tr»  thg  problem  ha  assumed  the  words  given  and  their  costs  fired,  and  the  task 
was  to  determine  the  frequency  distribution  p_  such  that  noma  economy  criterion 
is  rsat.  He  has  given  several  criteria  which  all  lead  to  essentially  th« 

Bane  result, 

lo  Let  os  suppose  that  th*  average  cost  per  word. 
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is  fixoi  in  advance,  and  we  look  for  the  best  frequency  distribution  to 


transport  inforaatiou  (in  Shannon’s  sense)®  That  is,  we  maximize 


H - 2prlog  pr  subject-  to  the  Abors  constraint ® (This  problem  is  fcn’miiy 


identical  to  Boltzman's  problem  in  statistical  mechanics:  to  find  the  maximum 


entropy  for  a given  average  energy- } The  following  conditions  are  necessary 


and  sufficient  to  solve  the  problem: 


« ■ P*M 

X'j,  * 44 


B > 0 


Zpr  - 1 


2 PA 


The  third  condition  debmairass  P*  and  the  fourth  B,  provided  that  C < logp0 


wots  the  cost  Cq  of  the  space  does  not  enter  heres 


2o  A second  condition,  which  is  a trivial  modification 'of  the 


first,  is  to  bold  H fixed  end  choose  tbs  distribution  so  .as  to  minimize  the 


overage  cost  C*  The  only  differanea  that  result?  is  tliat  B is  determined 


by  the  V5j.ce  of  H,  prorlded  H < itgfi,,  Again  the  value  of  C _ is  irrelevant® 


3»  A more  intorasting  variant  is  to  all-w  R end  C to  to  free 


and  to  minimi*®  the  average  cost  pel*  unit  of  imfixcmtion:  i®e®,  miniwiaa 


A o w 

*r  r 


- S Prl‘  ‘g  Pr 


TWCfc.'.  to 


hr*  ■**'-!!» f 


I 


aassMaa^gaM^^  ju^_  ,IWJ 


sublect  to  the  conatsaint  2 pr  ■ 10  As  before,  we  find  that 


-BC 

P*M  r 


but  now  B is  dciersiined  by  the  value  of  Cq,  and  so  both  the  -value  of  C aid 
of  H are  fixed  by  the  choice  of  C-a 


Finally,,  we  turn  to  what  Mandelbrot  called  the  ! secrecy  prcble®  o 5 
He  suppoeed  that  the  words  are  composed  of  letters  Lg,« o»,Uj,  where  G 

is  much  amal.'er  than.  R0  Let  the  letters  be  labeled  in  order  of  decreasing 
frequency denote  the  frequency  distrfinition  by  <1^,  .and  write  the  cost  of 
thw  letter  as  The  cost  of  a word  is  assumed  to  be  given  by  the  sum 
cf  the  costs  of  its  component  letters „ 

"The  bast  possible  cf  all  weighted  vocabularies  from  the  point  of 
view  of  the  secrecy  encoder  is  the  one  in  which  the  most  economical  code  is 
.1,1*0  unbreakable,,  Tfeo  code  must  then  be  a random  sequence  of  elements,  space 
included,  and  the  enemy  must  cither  gc  to  word  relationships,  that  is  go 
beyond  our  approximation,  or  try  all  boys,  the  number  of  which  is  astronomical „ ’ 
[p»  131,  60.-3  The  requirement  he  places  is  that  an  unbreakable  random  : sequence 
of  letters  transport  information  for  t he  smallest  possible  cost  per  unit  of 
informations  This  is  similar  to  condition  3 of  the  inverse  problem,  differing 
however  in  tliat  there  ie  no  element  corret>yuu,lu*^  to  the  word  space*  Formally, 
the  condition  is  that 

T/*  ^ 
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should  ba  • alniwan  subject  to  the  condition  that  X Q(_,  * lo  I.Yoo  this 
requirement  it  can  be  shewn  that  the  word  distribution  must  be 

-BC 

pr  ■ ph  r 

a«  before,  but  with  tbs  added  eooditena  that  B > 1 and  h » a©  » The  latter 
condition  follows  frun  the  l^iiresssnt  of  a random  sequence  of  letters  to 
sustain  seore-_7-  Vie  shall  discuss  the  condition  B > 1 a little  labors 

Let  ns  ■■■wweyf.Mt  to  attain  the  least  costly  transslssisn  when 
words  are  rank*:  a in  order  of  le'a  casing  xruquer^y,  then 

* [log^  (r  * m)  < lot  jd]  0 

?o  attain.  1}  the  mettanxm  infonsaticn  transport  with  the  average  cost  per 
word  fired,  jt  2)  Lbs  minimum  average  cost  per  ward  with  the  information 
transported  held  fixed,  or  3)  the  atnlnam  average  coat  per  unit  of  information, 
then  the  distribution  of  the  words  should  be 
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If  wr  ceehine  these  two  conditions.,  taking  into  account  the  fact  that 
statistical  fluctuatlarji  in  data  will  smooth  over  the  steps  of  the  farmer 
equation,  we  obtain 
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which  Mandelbrot  has  called  the  fccnouical  carve* : Observe  that  if  a ■ 0„ 

this  is  the  generalised  Zipf  Isy. 
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km  Mandelbrot  points  out,  the  fit  of  Zipf's  3-sw  to  oo3t  language 
data  is  good  only  Jr  the  central  range  and  it  is  in  error  for  the  moat 
frequent-  and  the  least  frequent  words.  By  choosing  values  of  B and  m 
different  fl/om  1 and  0 he  has  been  able  to  achieve  far  better  fits. 

The  condition  B > 1 which  results  from  the  secrecy  criterion 
h*s  h<*en  found  to  be  met  by  most  natural  languages.  Zipf  called  those  with 
B > 1 5 open  vocabularies'  and  these  uith  B < 1 'closed  vocabularies  o'  Most 
languages  with  closed  vocabularies  are  in  some  way  peculiar  or  special. 

Clearly,  Mandelbrot's  theory,  like  Shannon's,  is  normative,  but 
it  is  much  more  closely  related  to  a specific  enpirieal  field  than  is 
Shannon's.  Thus  the  question  must  b®  raised  as  to  exactly  what  Mandelbrot 
has  shown  and  what  it  mccr-c  for  linguistics.  "He  sava  that  if  one  wants 
to  communicate  efficiently  verd-by-vord,  then  one  must  obey  Zipf 's  law- 
There  is  a strong  temptation  to  reverse  the  implication  and  argu t that 

because  we  obey  Zlpf's  law  vs  must  therefore  b®  courjunicatiag  word-^fcy^w&vd 
•with  Maximal  efficiency.''  ip.  ill?,  72]  Of  course.  Killer  goes  oa  to  point 
cut  that  much  other  evidence  eoists  - such  as  the  redundancy  data  discussed 
in  the  last  section  - to  suggest  that  this  reversed  implication  is  false . 


It,  remains  to  be  seen  whether  it  can  be  shown  that,  narked  deviations  it, 
certain  direct iozs  from  perfect  efficiency  result  in  only  slight  af/viationa 
from  the  canonic tl  curve,, 

U»  The  Capacity  of  the  Human  Being  and  Ratos  of  Informtion  Tranter 

In  recent  years  it  has  proved  necessary  tc.  construct  a variety 
of  complex  irtformation-processing  systems  in  older  to  deal  with  certain 
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military  and  industrial  problems*  These  systems  typically  reeelww,  from 
diverse  sources,  a tremendous  amount  of  raw  information  which  mist  be 
filtered,  recoded,  and  correlated  into  wha4;  nay  be  called  a model  of  seme 
situation  of  interest  » Tfcs  model  must  be  sufficiently  single  so  that  a 
person  can  grasp  it  completely,  and  sufficiently  accurate  to  that  he  can 
reach  useful  decisions  on  the  basis  of  it-  For  example,  an  air  defense 
system  receives  raw  information  from  radars,  spotters,  airline  3ciiedules, 
weather  reports,  filter  readiness  mparts,  etc*  All  of  this  must  be  reduced 
to  a ai^iiilec  model  of  the  eneny  attack,  the  defense  facilities,  and  the 
defensive  response,  so  that  a commanding  officer*,  with  only  a few  seconds* 
or  Minutes'  daisy,  can  know  the  situation  continuously*  The  officer  must 
mjis  and  modify  his  defensive  decisions  on  the  basis  of  such  a modal o It 
is  clear  that  much  of  this  processing  - especially  where  speed  and  accuracy 
arc  needed  - can  and  should  be  reduced  to  machine  operations,  but,  with  out 
present  technology,  th<ire  are  certain  steps  which  am  far  more  singly  and 
ef  fectively  carried  out  by  r.  person  than  by  a Machine*  For  crumple,  one 
of  tide  first  steps  in  an  sir  defense  system,  and  cnc  which  is  not  easily 
duplicated  by  a Machine,  is  the  isolation  and  transfer  of  pertinent  information 
from  a radar  scope  face.  From  ail  the  random  noise  and  background  reflections 
cn  the  scope  an  operator  must  single  out  those  'blips*  which  are  aircraft, 
and  this  no  irist  introduce  into  the  rest  of  the  system,  ssy,  as  « -~=- 
telephone  message,,  The  question  arises  as  to  how  much  information  he  can 
process  per  second  over  a sustained  period,. 
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can  be  obtained  by  direct  experiments  on  the  trained  personnel  using  the 
equipment.  On  the  other  head,  the  qu«*sticn  arises  whether  it  ±a  necessary 
to  study  eaeh  new  situation  separated,  or  whether  the  pertinent  variable 
is  the  amount  of  inf  oration  in  bits/sec  which  will  be  presented  to  the 
operator  as  compared  with  the  maximum  amount  he  can  handle  0 

That  is,  can  we  treat  a human  being  as  a channel  and  so  determine 
a.  channel  capacity  for  him?  If  this  is  possible,  it  wilt  certainly  sinplify 
the  design  problem,  for  it  is  generally  not  too  difficult  to  determine  <&e 
rate  of  information  flow  in  the  machine  components  of  a system,,  The  question 
of  whether  it  is  useful  to  treat  men  as  channels  in  certain  situations  remains, 
in  the  opinion  of  many,  still  an  open  problem.  This  is  not  our  question  here; 
we  need  only  recount  r.sss  ox  the  stud  Us  wtirb  hare  bean  executed  to  determine 
his  capacity  under  the  assumption  that  be  can  in  fact  be  usefully  considered 
as  a channel. 

Considering  the  theory  presented  in  part  I,  two  procedures  to 
estimate  the  capacity  seem  possible.  First,  from  whatever  physical,  physio- 
logical, and  psychological  fact*  kn:rwa  and  relevant  to  the  type  of  transmission 
being  employed,  to  make  an  estimate  of  the  channel  capacity  . Second,  by 
varying  certain  variables  and  by  employing  diverse  coding  schemes,  to  find 
the  maximum  amount  of  inf  ormation  which  lie  can  be  caused  to  handle.  This, 

\.j  the  fundamental  theorem  of  information  theory,  affords  a lower  bound 
on  the  capacity.  RougrQy  spiking,  the  first  procedure  has  resulted  in  upper 
bounds  of  the  order  of  10,000  bits/sec,  while  the  second  yields  a lower 
bound  sc/wn/hcre  in  the  varies  or  10  to  100  bits /sec.  The  consensus  is  that  the 
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lower  bound  more  nearly  represents  the  human  capacity,  but  no  really  strong 
argument  exists  to  support  this  view  except  that  no  one  has  yet  devised  a 
way  be  achieve  a higher  rate*  VJe  shall  now  examine  those  estimates  in  a 
little  more  details 


U.l  Uiaper  Bounds 

Possibly  part  of  the  difficulty  in  obtaining  a satisfactory 
estimate  using  the  first  procedure  is  the  present  lack  of  an  adequate  model 
for  what  happens  functionally  within  a perscsi  wher.  he  is  processing  informations 
Tip’s,  independent  rasasuremanta  as  most  of  tfc&  ! channel » - which  is  surely 
not  kssegensous  in  its  properties  - casn.t  be  had-  Aa  a result-,  the  estimates 
which  have  been  made  s*  s in  a scass  only  c secerned  with  the  peripheral 
aspects  of  the  channels  We  will  cite  in  a moment  another  reason  %rfc±ch  has 
been  offered  to  explain  the  Oif fa:  -?«rvs  between  the  upper  and  lower  bounds  Q 
jLLckilder  and  iailer  [£3]  have  pointed  out  that  an  estimate  of 
the  capacity  with  respect  to  auditory  signals  can  be  obtained  froia  a result 
of  the  theory  of  information  for  continuous  systems  (see  the  appendix)*  It 
is  known  that  if  the  bandwidth  of  the  channel  is  W cyelss/seo,  and  if  the 
noj.se  and  the  signal  are  siaply  additive  with  a power  ratio  of  P/K,  then  the 
edacity  in  hits/eec  is  given  by 

O m 'flw 
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For  auditory  signals  a basdsddth  of  5,000  eyules/ssc  is  craservative  and  a 
«ignal-to-noise  ratio  of  30  db,  or  a power-  ratio  of  about  1,000,  is  net 
UJ:uUfuia2.£  in  which  sssu  the  eapacifcv  must  be  -ah oi.it  £0,000  bits/seo»  In 
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ActiMCL  attests  to  transmit  selective  inforsation  by  auditory  m«na?  a ;rate 
as  high  as  50  bits/eee  is  unusual-  In  other  words,  the  efficiency  of  the 
auditory  Rye  tern  must  be  considered  to  be  abort  0ol  per  cento  Licklider  and 
Hiller  offer  the  explanation  that  moat  of  the  information  transmitted  by  an 
auditory  signal  is  personal  information  about  the  originator  - hia  ’ray  of 
speaking,  his  mood,  and  sene  of  his  linguistic  history-  While  thin  may  well 
be  the  case.  It  is  Interesting  that  no  one  has  yet  devised  a way  to  use  thin 
apparently  available  capacity  for  the  transmission  of  preassigned  selective 
information,, 

A far  more  detailed  estimate  of  auditory  capacity  has  been  made 
by  Jacobsen  [1*1*,  1*5]  using  various  data  -»bout  hearing,  ouch  as  tlie  total 
mrrber  of  monaurally  distinguishable  tones-  He  concludes  from  his  analysis 
that  one  ear  should  be  able  to  handle  about  8,000  bits /see  )f  and  with  very 
loud  sounds,  10,000  bit-?.</«*«»  It-  is  known  that  thsre  are  approximately 
29,000  ganglion  cells  from  the  e«r:.  hence  the  average  rate  of  information 
transfer  over  a nerve  fiber  is  about  0„3  bits/ssc*  However,  he  points  otet 
that  "It  is  very  unlikely  that  there  is  any  binary  or  similar  coding  in  the 
cochlear  nerves-  It  i3  consequently  not  particularly  nearJjigfiil  to  state 
that  tii?  average  informational  capacity  of  a single  cochlear  fiber  is  about 
Oo'J  blte/sec-"  fpp„  U70-!>?1,  L51  This  result,  hrvcver,  can  be  translated 
into  the  equivaljart  number  of  tones  which  can  be  distinguished  on  one  fiber, 
and  he  obtains  hP  tensc/sse. 

Jacobsens  [1*6]  has  also  carried  out.  a similar  calculation  for  tho 
eye,  taking  into  account  facts  known  about  distarininability,  etc-,  but 
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i&jCSriskg  Ibu  affect  of  color.  Be  obtains  an  estimate  of  2|»3  x 10°  bits/sec 
for  each  eye.  Ffcwn  this  one  can  conclude  the  naxlarin  average  rate  over 
**oh  neural  fiber  most  be  5 bits/see.  The  inclusion  of  color  would,  of 
course,  raise  this  < wt irate . 

So  far  So  vn  have  determined,  these  are  the  only  estimates  of 
channel  capacity  which  a« i based  on  measurements  independent  of  the  actual 
rate  of  information  flsse,  Wt*  turn  now  to  estimates  of  hear  rapidly  informa- 
tion sf  a oairticuLar  type  can  be,  or  rather,  has  been,  caused  to  pass  through 
a person. 

k~2  Lsgsr  Bounds?  Kariimun  Qbs saved  Bates  of  Information  Transfer 

Let  us  first  consider  the  transmission  of  language  encoded  Smcrss- 
Mon.  fuller  [67]  points  out  that  if  we  consider  the  irerage  measured  length 
of  vowels  and  consonants  = shout  12.5  sounds/sec  - and  if  w*  were  to  suppose 
that  they  are  ‘1’^ju  Ou  :?Hc  auu  independently  selected,  then  spae-li.  would 

convey  information  at  a rate  of  67  bits/sec.  If,  however,  we  take  into 
account  their  relative  fTequanciss  (Dewey  [11]),  then  the  rate  is  reduced 
to  about  60  bits  /sec  - Farther,  if  we  tale  into  account  the  fact  that  vowels 
and  caneoaanta  tend  to  alternate  in  English,  the  estimate  is  only  hS  bits/soco 
Finally,  «m  the  basic  of  Zipf’s  law.  Miller  estimated  that  there  are  10v,6 
bits /word  C section  II.  2.3)*  Since  a sjjeaksr  ©an  sustain  a maxisEs  M UA  CU/UUV 
3 words/scc;,  the  transmission  wite  using  speech  can  be  no  mors  than  32  bits/sec. 
*?he  maximum  ef  ficiency  within.  the  restriction  imposed  by  the  phonetic 
structure  of  English  words,  therefore,  is  about  i'O  per  cent."  [p„  798,  w] 
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In  practice,  however,  an  ordinary  speaking  vocabulary  is  not  as  large  as 
assumed  when  Zipf’s  law  is  called  for,  nor  can  a parson  usefully  eoploy  a 
speaking  rate  of  3 vords/ueco  An  assuH^tion  of  an  equi -probable  distribution 
over  & vocabulary  oi‘  $,000  words  which  are  spoken  at  a rate  of  1.5  words/sse 
yields  an  infem&tion  rate  of  15  bits/sec. 

In  addition,  as  Quastler  and  tfuiff  [8?]  point  out,  the  ▼aria  a 
rate  estimates  using  Zipf  *s  law  ignore  the  constraints  among  words  „ Ibey 
cite  evidence  which  suggests  that  the  guessing  of  a missing  word  Vitkin 
context  ray  be  oorrect  as  much  as  30  per  cent  of  the  -time*  This  reduces 
the  information  transmission  rate  to  about  7 or  5 bite/word,  and  if  wo 
assume  that  1$  per  cent  of  the  wards  are  incorrectly  received,  the  estimate 
must  be  reduced  to  6 or  7 bita/word0  Using  Hiller’s  speaking  rate  of  1,$ 
words/3ec«  it  appears  that  from  10  to  20  bits/sec  is  a good  average  rate  of 
wunsmissioii,  and  that  with  rapid  speech  the.  rate  may  get  as  high  as  25 
bits/see. 

Quastler  and  Wulff  3Twpuii'C<  O&wt  uu  oo«*3Tcm>  Otl"3P  HldlillOdS  Of 
information  transfer,  and  in  summary  they  find  that  25  bits/sec  seems  to 
be  the  msxununi  rats.  In  all  cases,  a mechanical  response  was  required  of 
the  subject.  Vat  ihvy  verified  that  mechanical  limitations  were  net  determining 
an  apparent  rate  by  skewing  that  higher  rates  could  be  achieved  if  memorised 
satcrl&ls  ’sere  used.  The  first  experiment  liter  dJLsa^cad  was  baaed  on 
typing,  but  it  was  known  a priori  that  this  wouia  nor  ' * fastest 

possible  rates,  since  fcsxt  can  be  read  aloud  faster  than  a typist  can  take 
it  aowiu  Fur  this  sxpsrimsnt,  rendm  sequences  of  letters  were  drawn  from 
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alphabets  or  !*,8,  16,  sad  32  aynbolso  Throe  typists  with  from  5 to  12 
years ' exporiauce  were  paced  by  a raeironone  at  2,  3,  li,  aad  6 beats/ssc. 
la  general,  the  errcra  tf  j.*h  oeimrred  wore  the  transposition  of  letters, 
and  so  it  Is  a question  es  to  whether  thsss  should  bo  treated  as  one  or 
two  errors*  Depending  on  this,  we  obtain  the  following  upper  and  lower 
bounds  on  information  transmitted  (section  1.5) 


Upper  Bound 
Lover  Bound 


Alphabet 

sis* 

u 

8 

lb 

32 

o„7 

10.5 

13.2 

16,? 

3.8 

7.U 

11.8 

13. U 

Inf  causation  Transmitted  in  bits/sec 


It  was  fused,  tvs  would  be  expected,  that  with  tho  higher  Bstronoiai 
speed a and  with  ths  larger  alphabets,  the  greater  percentage  of  errors 
occurred.  For  8 and  lo  syscol  slpufibuts  a spcso  cf  3.2  ^ Oo2  Iccjb/ishc 
rspuSSijiit  cd  the  highest  affective  spaed,  aad  beyond  that  their  precision 
so  decreased  as  to  keep  tiis  brsnsnic i cion  rate  shout  constant,  and  beyond 
Uo  knya/oae  the  quality  of  their  output  decre-ised  very  rapidly 0 With  k 
syrijola  the  effective  speed  was  3«6  keya/csc,  -md  with  32  it  was  2^9  k&ya/a&jo 
when  the  subjects  were  not  driven  by  ~ sstrcawsc,  but  were  instructed  to 
type  as  rapidly  as  possible-  it  was  found  tint  the  rate  of  transmission  was 
down  about  9 per  cent. 
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A second  «rpariment  drew  on  the  sight-reading  ability  of  iL?e& 
young  pianists.  Thsy  vare  presented  with  random  music  (notes  selected 
using  random  numbers ) and  they  were  paced  by  a metronome  which  was  gradually 
increased  in  texqae  outt  trials.  Tspe  recordings  were  made  and  each  of  the. 
subjects  scored  each  cf  the  tapes  for  errors.  The  a&re«u»nt  was  fair,  fcrt 
both  a lew  const  \ or  & detected  by-  each  subject)  and  a high  count  (those 
detected  by  at  least  one ) ware  determined , The  Information  transmitted 
vss  computed  free  the  error  ecront  and  from  assumptions  about  the  error.* 
pattern o Again,  i’-everal  different  ‘alphabets'  were  employed!  3,  U,  5,  9, 

15,  25,  and  3?  keys. 

The  data  anew  that  the  highest  speed  for  which  the  error  rate 
remained  low  decreases  free  7 keye/sec  for  an  alphabet  of  3 or  1*  keys  to 
Ii,3  keys/aae  for  the  37  key  alpha  bat  □ This  decreased  speed,  coupled  with 
an  increase  in  error  rate,  keeps  the  information  transmission  rate  at  about 
22  bits/sec  ever  a fairly  vicb  range  of  speed  and  alphabet  miss. 

In  contrast  to  the  typing  experiment,  ludivictual  differences 
&®ca?as  apparent  whea  the  subjects  attempted  to  inroeed.  their  limit® . One 
kept  the  error  r*t-=?  low  by  i idling  to  keep  up  with  the  nk;t:roname,  another 
kept  the  pace  but  allowed  the  error  rate  to  become  large,  and  th9  third 
held  the  pace?  for  periods  and  than  lie  would  Ians  the  beato 

A -third  set  of  materials  for  determining  capacity  which  QuastJia* 
and  Vlulff  hare  studied  is  mental  «rltta»tic  problems.  They  point  out  that 
if  certain  plausible  assuij^tiens  ire  made  about  the  information  involved 
in  calculations,  and  if  the  published  time  data  on  so-Ccilled  ‘lightning 
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wiiculatcra 5 (people  who  arr  noted  for  rapid  mental  calculations ) are  used, 
arm  obtains  as  estimate  of  22  'Vo  2li  bita/aee  for  the  transmission  r ate.  The 
feat  of  jrccsL  people  appears,  t*er:afo<ra,  not  to  ba  a high  rate  of  information 
transmission,  but  rather  a tremendous  storage  of  information  for  short 
periods  of  time*  In  addition,  Qunstler  and  wvufr  conducied  ocssa  siupls 
experiments  on  Dental  addition  of  calumns  of  figures.  (to  the  avera^s 

-*  cgsir  1 rr  asking  sane  plausible*  but  d abatable.  assumptions  » 
a rate  of  6 to  If:  bite/sec,  but  one  r.xnent±ona2  street  sustained  a rate  of 

23  bits/aae „ 

Proa  these  dat»,  and  others  not  published,  it  appears  that  it 
ia  difficult  to  cause  a wubiect  vho  is  employing  familiar  operations  to 
assceed  « 1st  us  be  generous  - 5>0  bits/sac,  even  though  present  estimates 
of  ear  and  qye  capacity  exceed  this  several  hundred  tines * It  certainly 
seems  an  open  problem  to  bring  these  two  es  timates  closer  together,  ei  ther 
by  devising  a method  to  «g)loy  much  sibra  of  the  apparent  capacity  to  transmit 
selective  information,  or  by  a more  detailed  analysis  or  tfis  human  b*iag 
ae  a cfessnel  to  show  th^t  J>0  or  100  blts/nec  is  truly  Jiis  Usdi*  Jwoibaoa's 
comments  on  this  disparity  are  of  interest*  "Ihua  it  is  evident  tnat  the 
brain  can  digest  gensrslly  leas  than  1 par  cent  of  the  ir;f ocmation  our  tiers 
rill  pitas*  It  must  be  appreciated  that  the  ear  is  a chant  el  vastly  wider 
thn^  its  spprehensibls  output*  It  is  the  ability  of  the,  brain  to  scan  far 
those  portions  e*  the  auditory  bign&L  which  arts  of  interest  which  makes  the 
wide  cape  city  cr  the  ear  insudmaily  useful*5  [ p*  1*71,  liSj 
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li»3  Other  Observed  liaise  of  Information  TVarmfn r 

Not  all  the  experiments,  or  the  observations  taken,  on  rate  of 
infccreation  transfer  have  resulted  an  rates  as  high  as  those  described 
a Too  Eiridently  the  inode  of  presentation  of  the  information  vitally  af foots 
the  rate  at  which  it  can  be  handled;  if  this  conclusion  is  true,  then  the 
naive  program  outlined  at  the  beginning  of  this  section  must  be  modified 
to  some  dgree# 

In  this  connection  the  results  of  an  experiment  performed  by 
KXsaroer  and  Muller  [U?]  are  of  interest  * The  stimuli  consisted  of  five  limits 
arranged  in  an  ere;  a corresponding  set  of  telegraph  keys  was  arranged 
under  the  subject's  fingers 0 The  subject:  was  to  press  the  keys  corresponding 
to  those  lights  which  were  on*  By  using  various  rummers  of  bulbs  - the 
sua jects  were  told  which  would  be  employed  -1,  2,  3,  4,  und  $ bits  could 
be  achieved  in  the  presentation*  In  addition,  the  stimulus  cycle,  whi'.h 
consisted  of  lights  on  5>0  per  cost  of  the  cycle  end  off  the  last  $0  per 
Cent,  was  presented  at  a rate  of  2,  3,  4,  and  $ cycles  per  second*  The 
subjects  were  all  trained  on  the  apparatus  for  several  weeks,  and  the 
practice  curves  indicate  that  they  had  u onpietely  stabilised  by  tha  time 
the  experiment  was  performed* 

For  a fissd  nosher  of  bits  in  theAiauli,  it  is  found  that  by 
varying  the  rate  of  information  presented  there  is  a nearly  linear  increase 
in  the  transmitted  information  until  a peak  in  reached,  after  traich  the 
tnnfTCLSrion  rate  fails  markedly,  The  location  of  the  peak,  and  hence  its 
value,  is  an  increasing  function  of  tho  number  of  bits  in  the  stimulus.. 
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The  appa-oarimete  values  of  the  peaks  asras 


Tnfanaation  presented  in.  bits/stimulus 

1 2 3 It  5 


Peak  "opsnsraittcd 
InfOc  in  blta/’sec 


2o7 


2 3 It 

Uob  5ob  d»i| 


-pm 

I0o5 


The  decay  O:.  the  iJorfarnar.ce  following  t-Iie  peak  Is  remarkable  e 
In  the  case  of  «,  stimlus  with  3 fcitc^  the  pose  of  i°ci>  Mis/sec  occurs 
when,  the  i^pul  rate  is  approximately  13  bit-s/seeo  When  the  rate  is  increased 
to  15  btta/sec,  the  transmitted  information  has  dropped  to  6 bits/eee,  This 
drop  is,  of  course,  due  to  a radical  in^sase  in  the  <srrcr  rate. 

It-  should  be  ssutioneu  that  what  vs  report  arc  average  results, 
and  the  authors  present  data  to  shear  that  there  is  considerable  individual 
variation. 

Sew,  it  is  clear  that  the  rates  found  2 a t his  eaperlinetrt 

are  less  titan  those  described  in  section  Ilnli.2  above o In  msny  respects 
this  experiment  and  its  conclusions  axe  more  closely  related  tc  those  described 
jn  the  next  section  on  reaction  tines  than  it  is  to  eiiihar  the  reading, 
typing,  or  nct«d.«g  experiments «.  One  important  difference  is  that  in  "the  latte:.* 
eggarisssts  the  stimuli  are  before  the  subjects  at  all  times  and  hence  the 
receptor  mechanism  can  operate  with  a considerable  lead  over  the  response 
mechanism,,  whereas  such  * large  lesd  was  not  possible  in  D.emmar  and  Itillerhs 
stuefcro  It  thsreforo  appears  to  be  more  nearjy  a 5 continuously ' sseeutod 
reaction-time  e-xpori-ssont,,  This  can  be  supported  froit.  data  they  present.. 
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Typical  r**f ■- tion-t  ime  experiments  were  run  on  the  saw  subjects*  and  a 
comparison  of  the  inverse  of  the  reaction  time  to  the  stimulus  rate  (in 
stinili/see)  at  ?©*’:  traasnission  is  revealing: 

Bits  in  Stimulus 
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at  peak 
transmission 


me  Felton*  Fritz*  and  Grier  [18]  study  of  ocranranications  at 
* discussed  in  XI  »!i*  yields  soiiio  data  on  operational  n«*5u  m 
information  handling*  Using  * information  element*  * on  which  to  base  their 
calculations*  they  found  t * *Ck  w dlw^xug  a single;  landing  the  following  shunts 
and  rates  of  information  wra  employed  by  pilots  and  towers 
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However*  it  will  be  recalled  Lhat  they  determined  that  there  was  a ve:y  high 
redundancy  in  the  transmission,  and  if  only  'new*  information  In  emsidared? 
the  table  beccar^sr; 
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New  Information 
Transmitted  in  bite 

Rate  of  new 
information  trans- 
mitted in  bits/soc 

Air 

22 

1.6 

loser 

29 

2.2 

Either  sat  of  rates  Is  below  that  which  we  have  seen  is  possible  far  speech® 

Hick  writes,  nAs  a personal  speculation  from  such  data  as  are 
available,  it  seems  likely  that  transmission  rates  fall  into  three  fairly 
distinct  cx&bbess- 

1®  High  rates  of  10-15  bits  par  second. 

2®  Moderate  - 5-6  hits  par  second. 

3.  Slow  - 3-U  bits  par  sec  rad."  [p.  63,  35 j 
He  Teels  that  these  rates  axe  closely  correlated  to  the  mode  of  ixressntatioii 
of  the  information.  Hij$i  rata  <5  are  obtained  only  through  simple  * imitation* 
coles  of  the  type  we  learn  in  childhood.  Ilcdemate  rates  are  'typical  of 
T arbitrary*  specially  learned  codes  in  which  each  sigoal  has  a high  nxrormtion 
content.  The  lew  r abas  result  from  t xbitrsry  codes  having  a icr  irtfcr«siticii 
content  pe?  signal  aid  a hir*>  rate  or?  presentation.  As  a partial  and 
emeulaiive  explanation  for  rate®  less  than  full  capacity  Hick  comments* 
n5-;tfc  for  various  reasons  I am  inclined  to  suspect  - I would  certainly  not 
be  more  definite  than  lhat  - that  -there  is  a tendency,  overcons,  if  at  all, 
only  with  long  practice,  to  sidetrack  one  cr  two  bits  per  discrete  movement 
as  a kina  ox  mentoring  feedback.  It  would  be  originally  necessary  in  the 
course  ox  developing  the  eid.ll  (the  cole  being,  as  stated  above,  relatively 
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arbitrary  or  'unnatural*).,  and  may  ba  retained,  perhaps  as  a habit,  or 


perhaps  to  kero  the  skill  up  to  full  fcfzicrieney,  for  a long  time  after 
that."  [pp.  70-71,  35] 


5.  Reaction  Time  and  Information  Transfer 


Our  present  topic  may,  in  a semis,  be  c ensnared  a continuation 


of  the  last  section  on  capacity*  here  ire  shall  deal  with  what  night  be 


called  ‘nojaentary*  capacity.  Previously  we  considered  long  samples  of 


sequential  stimuli  to  which  the  subject  responded  more  or  loss  continuouslyj 


now  we  shall  consider  his  reaction  time  to  a single  isolated  display,  ilia 


question  is  what  characteristics  of  tfea  display  noed  be  considered  in  artier 


to  account  (simply)  for  the  observed  reaction  times.  The  hypothesis,  very 
generally,  is  that  the  information  content  of  the  display  is  the  relevant 


•variable  and  that  the  reaction  time  will  turn  out  to  bt  a very  single  fraction 


of  it  - namely,  Iir.i3ar„ 


There  arc,  according  to  information  theory,  a number  of  nays  in 
which  the  information  transmitted  can  be  varied:  a)  by  varying  the  numbfir 


of  wqux-probable  alternatives,  b)  by  altering  the  probabilities  of  the  various 
choices,  c ) by  introducing  sequential  dependencies  between  choices,  and 
d)  ty  allowing  errors  (noise)  to  occur.  In  the  theory  these  aro  equivalents 
Whether  they  produced  equivalent  human  ~d<7>Ov._  js  is  an  empirical  problem® 


In  the  first  eaperiraant  of  the  series  of  three  ws  shall  discuss. 


?T5  f3ii]  considered  cases  a and  d.  He  presented  subjects  with  a stimulus 
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iu  which  one  © £ n equally  likely  alternatives  would  arias , sad  the  subject 
had  to  respond  as  to  which  occurred.  Hie  hypothesis  was  that  the  reaction 
Mob  (RT)  would  be  proportional  to  the  information  is  the  stimulus,  or, 
in  other  words,  the  rate  of  Information  transfer  would  he  constant.  There 
is,  of  course,  a difficulty  in  assuming  RT  =>  k log  n,  since  when  n ■ 1 
this  would  require  a sero  reaction  tine.  Hick  suggests  that  there  are 
really  a + 1 alternatives,  since  ve  have  ignored  the  case  of  no  stimulus* 
While  thin  seems  reaoonafcle,  it  is  difficult  to  accept  hia  assumption  that 
all  n + 1 are  e^ui«yrdbabl8  and  hi.ai  HI  ->  k l»g(u  » 1%  However,  he  finds 
that  data  taken  by  Merkel  [6U]  are  well  fit  by  choosing  k “ 0,626  and  that 
Ms  own  are  fit  with  k » 0,5>l8s  Since  a fixed  delay,  independent  of  n, 
seems  plausible,  the  function  e + k log  v might  seen  intuitively  more 
suited  to  fitting  the  date,  but  it  does  not  fit  either  set  of  data  as  well* 
These  fits  were  obtained  with  n in  the  range  1 to  10,  i,e„,  up  to  a little 
more  than  3 bits. 

Turning  to  method  d of  varying  the  information.  Hick  points  out, 
if  tha  subject  can  bs  persuaded  to  react  more  quickly,  at  the  cost  of 
a proportion  of  mistakes,  there  will  be  a residual  entropy  which  should 
vary  directly  with  the  reduction  in  the  average  raacti.on  time-"  [p«  1>,  jit] 
An  experiment  was  performed  in  which  the  subjects  mere  pressed,  uid  the 
errors  xtsts  taken  into  account  by  computing  an  equivalent  eerror^free  n, 
n , The  reaction  time  data  whan  pitted  against  n_  were  found  to  be  fit 
pretty  mill  by  the  curve  obtained  .for  the  errorless  case* 

As  Kick’s  student  Crcssnsan  5tsw»,  sThe  sri gjiial  fevllriice  that 
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the  information  measure  was  the  appropriate  ass  to  too  for  Interpreting 
ohcdoe-roaotion  tires  was  simply  that,  the  logarithmic  f aastion  occurs  in 
both*  TMk  in  itself  is  not  strong,  since  logarithmic  relations  occur  rather 
often  in  biological  T>«9arxrerient.  The  Case  b^CSS&B  muun  strjnggr  iuwj  Hlu <£*S 
finding  that  the  redvdCbion  in  response-time  sherr,  errors  are  psxsdttod  ••-•••- 
obeyed  tfv*  same  las.*  [p<>  hi,  10 J 

in  Hick’s  experiment  the.  rate  of  information  transfer  was  about 
$.6  bits/sec,  a value  which  is  lr»  compared  with  the  largest  obtained 
using  a •continuous'  ctiraili  presentation. 

3yn&  [l£L3  has  ecsnsd^sd  methods  a,  b,  and  c of  vserjr inf  the 
information  When  the  performance  was  kepi  errorless.  Hs  states  his  hypotheses 
as. 

*1)  Reaction  tine  la  & aontonlcally  inere-wing  function  of  the 
amount  of  information  in  the  stimulus  series. 

*£}  The  regression  of  reaction  tins  upon  amount  of  information  iu 
the  sane  whether  the  amount  <V*  A "Pn-rma+A  on  p«r  stinniufi  is  varied  by  altering 
the  number  of  equally  jircbabl  s alternatives,  altering  the  relutlve  frequency 
or  occurrauwe  af  particular  altcrnatisw,  or-  altering  the  sequential  depend- 
assies  among  occurrences  of  successive  stimuli."  [p.  189,  ill] 

The  stimulus  presentation  was  by  mans  of  a matrix  of  li  ghts 
wi  th  a range  of  0 to  3 bits.  The  subjects  responded  by  means  of  a vocal 
key-  which  ttesmi  to  yield  more  precise  measurements  than  the  h^.d-?r_ierwtod 
key  of  Hick's  a’speriaxr  t*  The  subjects  were  given  complete  statistical 
information  about  the  stimulus  and  before  each  test  run  they  were  given  simple 
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gequsasve  formed  according  to  the  appropriate  Fotcr  subjects 

were  need.  The  ssi-relations  reported  below  are  the  «wwr&ga  of  the  four 
carrels  ticns  computed  far  each  subiact  separately. 

In  the  first  phase,  the  nurbor  of  equi-probable  alternatives 
;rere  -varied  and  a correlation  of  0.983  was  found  between  reaction  tinea 
ssd  information  in  the  stimuli.  Thin  confirms  nine- a results  c in  the  aeooj*d 
phase,  when  the  relative  frequencies  wore  changed,  an  average  correlation 
of  0.97$  was  found.  In  the  third  phase,  introducing  sequential  dependencies 
resulted  is  a correlation  of  0.938.  The  last  correlation  is  sigsificuntly 
loser  than  the  other  two. 

Hyman  canelud3s  from  his  data  that  Ms  second  hypothesis.,  while 
not  acceptable  at  the  1 per  cent  lrval-  is  acceptable  at  the  5>  per  cent  level. 

Hi  discus  sing  the  second  phase,,  he  points  out  that  the  reaction 
tires  el  the  less  probable  events  were  much  langur  than  those  of  the  no re 
probable  ones,  and  that  the  reaction  time  used  ie  actually  a weighted  msss. 
of  those.  Cross natn  [10]  examined  this  phenomenon  in  greater  detail  as 
Jinothra*  test  of  tho  central  hypothesis.  "When  a subject  responds  to  a 
sequence  of  signals  all  of  which  belong  to  a knora  set  but  sesue  of  which 
occur  more  frequently  tjisn  others,  his  average  response^time  will  bo  pro- 
porticwal  to  the  average  information  pea*  signal.  This  felloes  from  the 
hype thesis  that  the  subject  deals  with  informauioa  at  a constant  OTte." 

Lp»  hi,  10]  To  test  this  he  used  a sorting  task  on  oruiouir/  playing  cards 
and  l>y  varying  tnc  dinacsions  on  which  they  were  to  be  sorted  he  -;ibs  able  to 
examine  the  reaction  times  over  a range  of  0 to  2 bite /card.  The  correlation 
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bettrewu  reaction  tints  and  information  in  a card  was  0.66,  and  vhrm  the 
data  are  plotted  it  appears  that  no  single  curve  will  fit  them  better  than 
a straight  Hoe* 

Grossman  adduced  evidence  to  show  that  the  deviations  fjrea 
linp»rity  vere  dna  to  differential  difficulties  in  discilminattnff  the  cards 
In  different  classes*  TVT  S oasis  of  this  he  mads  the  important  Observation 

that  there  is  a major  difficulty  in  the  use  of  information  theory  in 
psychology,  for  information  theory.  In  the  discrete  ease  stated  by  Shannon, 
says  nothing  about  actual  signals  and  the  process  of  distinguishing  them  on e 
from  another;  it  deals  only  with  abstract  symbols  already  identified  and 
distinct."  tp.  h9,  10]  TbSas  °f  course,  suggests  carrying  out  a similar 
experiment  using  only  one  dimension  of  discrimination  and  causing  the 
entropy  to  vary  along  it.  This  was  dono  and  the  fit  was  ingrcvsd. 

On  the  basis  of  his  data.  Grossman  concluded  "...  cur  hypothesis 
that  rate  is  constant  under  variation  of  relative  probabilities  is  upheld 
by  these  observations,  with  the  proviso  that  ’dtoerlndnability*  of  signals 
should  be  equal  to  a sense  yet  to  be  precisely  defined."  [p.  50,  10] 

From  these  data  it  seems  reasonable  to  conclude  tentatively  that 
the  rate  of  information  transfer  in  a reaction  time  eiipeidmant  is  consto.it 
when  the  information  to  the  atimulun  is  in  the  range  0 to  3 bits.  Since 
this  conclusion  is  not  in  conformity  with  the  observations  made  with  a 
•eonttoo-— 1 stimuli  presentation,  it  would  certainly  be  interesting  to  see 
whether  the  rate  s*ematos  constant  when  tharei  are  more  than  3 cits  in  the  stimulus. 
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sad  also  to  see  xcttfruher  an  cxusri  jnerrt  can  be  found  with  tbs  seta  cooatast*,, 
bet  sstoh  larger  than  J>  bits/sec,  for  the  range  0 to  3 bits* 

6*  Yjgcal  Threshold  and  Word  Ekequenctofl 

Ilk  IdJB  XUb  VUA*7U  there  baa  been  a curies  of  experiments 

relating  the  visual  threshold  of  ward  r**ogcitioa  (as  given  by  taehleto- 
scopic  or^«raran»nta } to  the  frequency  at  their  occurrence.  Origii^Hy, 
the  pfTosrnrn  g femaH  from  work  on  the  Hruner-Postcan  hypothesis  that  sentences 
which  relate  to  things  liked  are  recognised  with  less  difficulty  than  those 
relating  to  things  disliked*  Evidence  has  accumulated  that  the  major  relatloa 
is  actually  between  recognition  (speed  and  the  frsq uen*y  of  occurrence  of 
the  word  Aw  the  language.  Hoifss  [39]  cites  dsta  Involving  sentences*  and 
Howes  and  Solomon  [UO]  similar  tiafca  inrrolvias  coaly  words.  In  the  latter 
ease,  word  e cunts  ware  obtained  from  Thorndike  mid  Large  [971 

and  things  war  found  to  b?  « correlation  of  alout  -0*7  Between  recognition 
time  and  the  logarithm  of  -word  frequency;  Hcasw  [39]  and  Miller  [68] 
describe  data  taken  by  Solomon  in  which  aarrra-letter  Turkish  vret'da  ware 
used*  T5kw»q  were  written  on  cards  which  the  subjects  studied.  Swi  wards 
appeared  on  msirr  sards*  others  on  only  a £<m9  so  there  was  differential 
exposure  to  those  new  words*  A VUAA*  VMiCitw.  on  of  -0*96  hts'  fc/uwl  berwo*r> 
recognition  tins  and  log  frequency*  King-KLlison  and  Jenkins  [U7]  repeated 
Solocwn’s  experiments  with  acme  slight  variations ? including  the  use  of 
artificial  flve-lettor  words,  and  they  obtained  a correlation  <f  -0*99* 

They  point  out  that  a relationship  to  information  theory  is  suggested,  namely. 
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thafc  rssognitiau  time  is  a linear  function  of  the  information  trensnd,tted 
by  a ^wwi*  The  earlier  eoaaent  we  quoted  from  Croasman,  namely,  that, 
logarithmic  relations;  are  so  common  in  biology  and  psychology*  that  more 
most  be  established  before  an  information  theoretic  model  is  assumed,  is 
relevant  here*  Further  studies  appear  to  be  needed, 

7*  The  Information  Transmitted  in  ibsoluts  Judgoants 

Vihen  a subject  is  required  to  place  atiiauli  which  vary  along  ons 
dimension,  such  as  eiose  or  iourinsea,  into  « simply  u^"ucicu  CouBgvT j CwiL 
as  the  first  M integer’s,  then  he  is  said  to  be  making  absolute  Judgments  of 
the  dimension  of  the  stimuli.  For  example,  the  s tiss.ua  might  bo  p W'O  Lonos 
at  100,  150,  200.,,, ,1,000  cycies/Bteu.  Each  time  a tone  is  presented  he 
must  piaee  it  in  a category  as  accurately  as  he  can.  It  is  clear  that  in 
general  errors  will  occur  of  the  form?  a tone  with  a lower  frequency  than 
another  will  be  put  in  a higher  number  category.  It  is  also  clear  that  the 
error  rate  c»r»  probably  be  diminished  by  reducing  the  number  of  categories  * 
For  example,  if  he  must  place  the  above  stimuli  in  21  categories,  tjb  may 
expect  more  errors  than  if  he  need  only  report  whether  a signal  is  below 
or  above  £00  eyclss/soc,  for  then  there  will  be  little  ambiguity  is  his 
mind  except  for  those  stimuli  near  5 00  cycles • Such  experiments  have  a 
long  history,  but  there  has  always  been  some  difficulty  in  svmmariaing  th* 
data  - Just  how  should  the  error  picture  be  summarised? 

Gamer  and  Hake  [27]  pointed  out  that  the  matrix  relating  input 
stimuli  to  response  categories,  with  the  entries  the  frequencies  of  pairings 
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b<rt»racn  stimulus  sad  a category,  can  be  treated  (with  the  obvious  noraali- 
zaticn)  m a noise  matrix  for  a consmaiAcation  system,  where  the  cenmunlc&tlon 
is  cf  selective  information  from  the  stimuli  to  the  experimenter  vie  the 
subject  as  a c tunnel.  We  say,  therefore,  compute  the  inroitratioa  of  the 
stimuli  set  Cvhioh,  of  course,  depends  on  the  relative  frequencies  ox 
presentation  of  the  different  stisali)  «nd.  the  equivocation  of  the  trans- 
mission, and  the  difference  is  the  information  transnittod,  If  for’  a certain 
type  cf  absolute  jttdgmeat  it  is  found  that  20  categories  allow  the  trans- 
mission of  3 bits,  then  in  principle  as  much  can  be  transmitted  using  only 
6 •unaz&igucras  categories . Choosing  the  categories  so  that  there  is  no  ambi- 
guity, ises;  no  errors,  may  be  difficult,  but  Garner  and  Hake  point  out  that 
if  the  errors  have  a Gaussian  distribution  the  condition  is  equivalent  to 
a criterion  of  equal  disnrizrLaabilit”- 

2h  another  paper  (30j  they  cite  the  major  difference  between  the 
usual  error  analysis  for  experiments  of  absolute  judgments  aad  the  proposed 
information  theory  analysis  s An  error  analysis  ignores  the  fact-  that  if 
the  error  distributions  do  not  overlap,  there  will  be  no  ambiguity.  Tn© 
information  analysis  takes  this  into  account,  but,  unlike  t.ho  error  analytic, 
it  completely  ignores  the  magnitude  of  the  errors.  There  are  3oae  applications 
where  it  is  preferable  to  have  * multi tuds  cf  small  error®,  provided  that 
there  is  never  a single  major  one. 

A number  of  applications  of  this  proposal  have  been  made  to  different 
classes  of  absolute  judgments.,  roll anc  iTb]  studied  tones  which  were  epacec 
erui-distantly  on  a logarithmic  freqvency  scale  from  100  to  9,000  cycles /s ecs 


The  subjects  had  to  assign  a number  to  each  tone  presented,  When  there 
were  2 and  It  tones  in  the  stimulus  set,  the  transmission  was  perfect,  1 and 
2 bite  respectively*  But  with  8 and  16  tones,  the  curve  became  flat,  and 
the  average  maximum  transmission  was  2,3  bits,  or  the  equivalent  of  perfect 
is=ntifieatijon  among  o ten ev5  The  best  subjects  reached  the  equivalent  Of? 
only  7 tones  * On  the  grounds  that  there  are  known  to  be  Is-  to  60  identifiable 
sounds  associated  with  speech  and  music.  Pollock  felt  that  tbore  most  basn 
been  a serious  underestimation  of  tha  information  transmittsd,  and  30  he 
performed  a series  of  auxiliary  experiments  to  attonpt  to  raise  the  value* 

Six  different  partitions  of  the  frequency  spice  were  examined,  and  the 
frequency  range  was  varied  with  the  bottom  held  at  100  cycles/sec  and  the 
top  moved  from  J>00,  2,000,  U»000,  and  8,000  cycloa/sec.  These  variations 
resulted  in  only  a few  percentage  points  change  in  the  information  transmitted. 
He  suggests  that  the  result  is  so  iov  because  of  the  acute  sensitivity  of  the 
information  measure  to  error,  which  we  have  montdonsd  earlier  (section  3). 

Halsey  and  Chapanis  [32j  have  presented  siailUxr  data  on.  tfco  number 
of  absolutely  identifiable  spectral  hues,  and  though  they  do  net  apply  sn 
informational  their  findings  are  of  seas©  interest*  The  colors 

vero  identified  sequentially  from  violet  to  red  iy  ffuubaro,  .'Uid  the  subjects 
were  familiariwsd  with  the  number-color  code  until  learning  wav  completed « 

In  a test  using  10  hues  and  20  judgments  per  hue,  they  found  that  two 
observers  were  correct  in  97 • 5 per  cent  of  the  .judgments,  Thess  lues  were 
selected  on  the  basis  of  several  earlier  experimental  run3  in  which  more 
buss  tiers  emplqyed,  but  a lower  accuracy  was  obtained,  Th'ty  note  th.it 
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absolute  idontifiability  of  10  hoes  la  considerably  bettor  than  bad  Lwon 
previously  reported,  but  they  attribute  this  mainly  to  different  experimental 
conditions. 

Hake  and  Garswr  [30]  applied  the  lnfennatiim  theory  analysis 
*«..  to  determine  the  mini  mm  nusbep  of  different  pointer  positions  which 
can  be  presented  in  a standard  ird—rpolatinn  Inierral  to  trssaait  the 
maximum  saerant  of  information,  not  about  which  positions  of  the  pointer  are 
occurring*  but  about  the  «vent  continuum  being  represented.;  * [p.  35B,  30] 

$*o  uaristiotw  were  runt  in  the  limited  response  case  the  subjects  were 
told  the  ratines  the  pointer  could  assume  and  they  were  required  to  respond 
only  with  those  cumbers j In  the  unlimited  response  case  no  such  restriction 
was  madSi  10,  20,  and  50  possible  pointer  positions  were  used,  and  the 
data  are  mesnarised  below* 

Number  of  Positions 


Infc-xT».tion 
Transmitted 
in  Bits 


W*  observe  tfcat  beyond  10  pointer  positions  the  usicuot  of  information  trans- 
mitted is  wipigtiy  constant  - squlTaleit  to  about  10  sjrroi'io-o  positions. 
There  ses ss  to  be  little  or  no  difference  between  limited  and  unlimited 
isspcasti  as  far  as  this  analysis  is  concerned,  but  Hake  and  bar as?  point 
out  that  an  error  sjnlysis  shews  that  the  errors  increase  when  the  subjects 
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are  allowed  trill  mi  tod  response* 

In  a later  paper.  Garner  [26]  c assents:  "A  measure  of  Infamatica 
drannndasioii  provides  a means  of  specifying  perceptual  and  judgnental 
accuracy  in  sit-nations  where  absolute  jndsp»«nte  about  various  categories  on 
a stimulus  continuum  are  required.  This  measurement  allows  the  determina- 
1;lon  of  the  maximum  number  of  atimlvn  categories  which  could  b9  used  with 
perfect  accuracy  without  the  necessity  of  sampling  all  the  post  iblo  nusfcere 
of  categories * Bearer,  this  use  of  infonnation  transmission  j-equires  the 
s^ewngition  that  the  inherent  jud®r&ntal  accuracy  is  independent  of  the 
number  of  stimulus  categories  used  esperir&ntalty . Two  experiments  (darner 
and  Hake  and  Hake  and  Garner)  have  shown  that  this  assumption  is  quite 
valid  £<rp  situations  involving  judjjmente  cf  position  in  vicaal  space,  and 
Pollack1  a eaqieriment  demonstrates  its  validity  for  .judgments  of  pitch*3 
[p.  373,  28]  Gamer  then  proceeded  to  examine  its  validity  in  judgments 
of  loudness  of  taxes  using  1$,  $,  6,  7,  10,  and  20  categories.  He  found 
that  judgment  accuracy  vss  nearly  perfect  for  k and  *>  categories  (perfect 
Using  2 and  2.32  bits  respectively)  • put  that  it  had  dropped  to  I»62  bite 
for  20  categories,  which  ia  equivalent  to  perfect  accuracy  for  only  three 
categories.  Thus  the  assumption  is  apparently  not,  valid  for  loudiicoo.- 

He  wart  on  to  show,  however,  that  tbs  information  transmitted 
could  be  improved  if  both  the  observers,  i.e.,  the  subjects,  and  the  stimuli 
were  taken  as  inputs  to  the  system  and  the  responses  as  outputs.  (See 
section  1.5.2  for  the  analysis  procedure  when  there  arc  more  than  tec 
dimensions.;  xn  ether  words,  tiiers  was ooi.duifitaoas  variability 
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subjects  whsr.  a xszge  aorfbsr  of  categories  was  eo ployed,  A further  reiaiag 


of  the  irfornntioa  transmitted,  so  thera  is  no  drop  at  ®US  is  achieved 


if  the  stimuli,  the  observers,  and  the  preceding  stimulus  are  all  taken  as 


inputs  to  the  ?yeten. 


KLev^sr  and  Priok  [18]  carried  out  a similar  experiment  and 


analysis  but  with  two  and  three  stimulus  dimensions  instead  of  one*  They 


flas.wd  (0=03  39c)  a display  consisting  of  white  dots  on  a black  background 


to  subjects  who  marked  on  answer  sheet  grids  what  they  thought  the  position 


of  the  dots  to  be.  roe  experiment  was  run  both  with  and  without  grid  lines 


on  the  black  background,  and  there  was  not  found  to  be  an  appreciable  difference 


striated  to  tbw  . -s=eots.+d  on  of  ora  dot. 


the  information  in  tha  atinrolus  could  bo  varied  by  changing  the  order  of  the 


matrix,  of  possible  positions,  from  3*2  bits  (order  3)  to  5*2  bits  (order  6) 


there  was  an  Increase  in  information  tisnrmitted  from  3=2  to  4«U  bits. 


Proa  5*2  bits  to  8.6  bits  (order  20)  in  the  display,  the  information  trem- 


joitted  remained  approximately  constant. 

In  addition,,  the  number  of  dots  presented  was  varied,  and  it 
was  found  that-  by  using  h dots  and  a matrix  of  order  3 (7.0  bits)  6.6  bits 


vore  transmitted.  Further,  whan  from  1 to  ?•  dots  wore  used,  then  a display 


having  8.0  bits  resulted  in  almost  perfect  transmission  - 7*8  bits.  *Xt 


is  clear-  that  the  maximum  amount  of  inrormauion  that  be  assimilated 


frss:  a brief  visual  expesiTre  is  a function  of  the  type  of  encoding  used. 


The  question  immediately  arises  as  to  whether  or  not  there  is  a common  metric 


which  may  fc.i  applied  to  the  difforeat  message  classes  and  which  will  correlate 
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'sith  tine  ngodanaa  lefornatlon-waTyiag  c*:pioity  of  that  claas,,**  [p.  15,  «8] 
They  observe  that  using  caly  one  dimension  or  coordinate  (the  location  of 
a point  oa  a line)  Hake  and  Garnar  found  a raucism  transmission  of  3»1  bits, 
•asing  ti>e  two  coordinates  of  a mtidx  t„h«y  obtained  a ward  mum  of  <■-!{  bits.; 
and  using  the  two  coordinates  o.v  a matrix  plus  the  one  of  the  number  of  dobs, 
tluyfirsid  7.5  bits  transmitted-  Tills  suggests  that  the  vm-rdimm  ineresses 
with  the  number  of  dimensions □ 

In  this  connection,  Christie  and  Luce  [9}  have  suggested  that  a 
careful  analysis  of  the  distribution  of  disjunctive  reaction  times  in  single 
choice  sibuatioae  - like  lue  ones  described  above  - may  permit  a model  of 


tiro  'ffltui bal!  Or  ~ luwmal : avfUCturiXig  Ox  simple  uoOlaxuu  px-Ccwtroes*  Tinay 
suggest  representing  this  structuring  by  a flow  diagram  (also  called  a 
network  or  «n  oriented  graph)  which 
indicates  the  goners!  ierpcrel  organisation 
of  certain  gross  internal  processing  of 
the  information.  Two  special  and  extreme 
eases  are  serial  and  parallel  processing, 
which  are  diagramed  in  the  figure.  In  seme 
highly  speculative  comments  they  suggest 
that  parallel  processing  may  be  carried  out 
on  information  which  is  presented  in  what  we  intuitively  call  several  different 
dimensions,  and  that  serial-  processing  is  effected  on  informtion  lying  la 
ens  dimension.  With  seep  simple  assumptions,  L-iiey  show  that  such  a model 
has  the  appropriate  information  tituasmission  properties  for  a matrix  display  - 


Response  ■«—  « -Stimulus 

Serial 

Response  e—  Stimulus 

Parallel 
ft?  - 
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at  least  rp  to  6 <n>  8 bits*  If  the  technique  they  suggest  - which  we  rfcall 
not  detail  hero  - is  practical,  then  it  any  serve  to  give  an  emp:Lrical 
definition  of  what  psychological  dimension  raw  ms* 

On  the  basis  of  the  several  experiments  we  have  discussed,  cue 
can  conclude  that  for  objective  ratings  there  is,  up  to  a point,  an  increase 
in  the  information  transmitted  with  an  increase  in  the  nunflber  of  categories, 
and  *rtar  thnt  point  the  i'^farsetion  transmitted  either  remains  constant 
or  decreas«s.  Bendig  and  Hughes  [3]  raised  the  question  whether  the  rass 
conclusion  is  possible  for  ratings  of  subjective  feelings*  To  study  this, 
they  had  subjects  evaluate,  according  to  either  3.,  5’,  7,  9,  or  11  categories, 
their  kno»l*d£«  of  12  different  countries*  Anchoring  statements  c£  the 
form  "I  know  (a  great  deal)  (something)  (very  little)  -about  this  country* 
vere  enploysd  in  three  variations:  center  anchored,  both  ends  anchored, 

and  both  ends  and  the  center  anchored.  Information  transmission,  they 
found,  was  increased  by  an  increase  in  the  verbal  structuring  ox  the  scale, 
i.o.,  by  the  anchoring,  but  the  increase  was  not  very  marked*  With  the  anchoring 
held  constant,  there  was  a nearly  rectilinear  increase  nf  information  trans- 
raittad  with  an  increase  of  number  of  scale  categories,  except  that  there 
was  a deceleration  in  the  step  from  9 to  11  categories.  This  effect  is  in 
accord  with  tha  diminishing  return  observed  for  objective  scaling. 


80  Secpvyrblal  Dependencies  and  Initiate  recall*  Operant  Ckmditicr.ulngt 
•jlitail  l gihin tg,  and  Percsruion 

One  of  the  main  points  of  the  191$  I!iller  and  Frick  (65]  paper  was 
to  bring  to  the  attention  of  psychologists  that  in  information  theory  they 
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had  a tool  ideally  salted  to  the  characterisation  of  sequential  dependencies 
in  the  stimulus,  In  the  response  data,  oar  In  both*  There  appear  to  have 
been  four  areas  of  psychological  study  to  which  this  observation  ha a been 
applied:  to  the  learning  of  written  material  as  a function  of  the  statistical 
dependencies  in  those  matexlals  > to  the  sequential  responses  obtained  in 
operant  conditioning,  to  the  intelligibility  of  verbal  material  as  a function 
of  statistical  dependAnolas  within  the  material,  and  to  the  ability  of  subjects 
to  pezeeive  statistical  dependencies  in  materials*  We  shall  discuss  them 
in  that  order* 


3*1  lugaedlate  Recall 

•'Brt.efly  at»*ueu;,  tuS  prob assu.  * * aisn  s»H  o«n  pssolc  rsmssbeff 

sequences  of  symbols  that  have  various  degrees  of  contextual  constraint,  is 
their  composition?  The  experimental  literature  contains  considerabls 
evidence  to  support  the  reasonable  belief  that  nonsense  is  harder  to  rmsmber 
t.hwn  oense*  This  evidence  has  suffered,  however,  from  a necessarily  t ob- 
jective interpretation  tsT  what  was  sensible.3  f»>«  179.  66]  Using  Sh/isnen’s 
asthod,  Hiller  and  Selfridge  [661  prepared  II  ^ order  approximations  to 
English  in  the  following  manner*  A sequencr.  of  H succasuive  words  wns 
chosen  at  randem  from,  a connected  text,  and  a subject  was  asked  to  imbed 
the  passage  in  a m&aningful  sentence*  The  first  word  in  his  sentence 
following  the  original  group  cf  N vends  was  recorded.  To  the  next  subject 
was  presented  the  last-  N-l  words  of  the  original  passage  plus  the  new  worn, 
and  he  placed  this  N-word  passage  in  a sen'benco.  The  first  word  after  the 
passage  was  recorded,  and  so  on.  In  this  manner  they  generated  approximations 
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of  order  0,1«2,3*U,5>,  arid  7 in  passages  of  10,  20,  30,  and  5>0  words  in 
lengths  Using  these  approximations  to  English*  plus  meaningful  text,  & 
standard  recall  experiment  was  executed.  with  the  passage  length  held  constant, 
thsy  found  that-  tha.  percentage  of  recall  tnarsases  with  an.  increase  in  the 
order  of  appfu&iu^tion  to  Fnglish.  In  particular,  for  the  30  and  5 0 word 
passage;;  the  recall  %U  ul“i$  5**  end  y**1  order  apprxoiastionc-t  to  English  is 
Tory  little  different  from  the  recall  of  tort  material  of  the  same  length  - 
this  notwithstssdizig  the  fact  tliat  the  order  is  quite  nonsensical  and 

4*\ 

the  7 order  would  by  no  means  be  covusidered  English*  With  shorter  passages, 
recall  con^arable  to  that  of  test  was  achieved  for  even  lower  valuaE  of  N« 

“The  results  indicate  that  meaningful  material  L-  easy  to  learn, 
not  because  it  is  meaningful  pgr  ae,  but  because  it  preserves  the  short 
range  associations  that  are  familiar  to  the  Ss.  Nonsense  matr trials  that 
retain  these  short  range  associations  are  also  easy  to  learc.  By  shifting 
the  problem  from  ’meaning*  to  ’degree  of  contextual  constraint1  the  whole 
area  is  reopened  to  experimental  i-westigitionn.3  [ps  183,  66]  Far  example, 
one  may  ask  whether  their  conclusion  is  valid  for  the  whole  memory  decay 
eawp.  or  whether  it  holds  only  for  short  term  memory. 

Similar  results  have  been  found  by  Abom  and  Rubsnctein  [1]  in  a 
slightly  different-  experimental,  situation*  They  devised  an  ’alphabet’  of 
16  nonsense  syllables  which  fell  into  four  easily  distinguished  classes  of 
four  tiyllables  each}  this  classification  was  sh~*n  to  the  subjects.  EVom 
these  syllables  six  classes  of  passages  of  30-32  syllables  were  constructed. 

The  members  of  the  first  clasp  ware  formed  by  random  selection  of  syllables. 
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and  the  othci s bad  increasing  amounts  of  organization.  For  eraripie,  class 
four  passages:  were  narked  by  comma  into  groups  of  four  syllables,  and  the 
first-  syllable  of  each  group  vac  c'noaea  from  class  one*  the  second  from 
class  two,  etc.  The  subjects  ware  rilowcu  1C  minutes  to  study  the  fojsaal 
organisation  of  the  passage  on  which  they  would  be  tested  and  then  three 
adrratea  to  learn  the  actual  passage,  after  which  th»y  v iro  asked  to  reproduce 
it  as  accurately  as  possible.  Tbs  roathors  had  two  hypotheses s *(a)  The 
amount  of  learning  in  terms  of  syllables  recalled  1?  greater  su  the  organisa- 
tion of  the  passage  is  greater,  i.s.,  as  the  average  rate  of  information  is 
gasllsr.  (b)  The  ej-sunt  of  learning  in  berms  of  the  information  score, 
computed  as  the  product  of  the  nunher  of  syllables  recalled  sad  the  s vsrage 
rata  of  infoxcaticn,  is  constant  for  all  passages."  [p.  26l,  1]  The  data 
vwrlfied  the  first  hypothesis,  but  not  the  second.  For  the  first  four  passages 
the  toUJL  amount  of  information  leasTied  was  constant,  hut  it  dropped  in 
passage  5 and  even  mors  so  in  passage  6*  The  breaking  point  was  between 
1*5  and  2 bits  /syllable . This  result  simply  rjaans  thai  the  subjects  ware 
unable  to  memoriae  enough  syllables  to  keep  the  information  score  high  who n 
the  infoxmation  per  uyllablo  was  very  low.  Both  these  findings  are  in 
conformity  with  thous  of  Miller  and  Selfridge  above. 

8.2  Operant  Conditioning 

F!riok  ana  MUler  1 19]  have  reported  an  application  of  their 
earlier  ideas  for  the  measurement  of  stereotypic  behavior  I" 65]  to  the  operant 
conditioning  of  rats  in  a Skinner  bar.  Twc  responses  were  observed;;  approach 
to  food  (A)  and  bar  pressing  (B).  "Instead  of  the  usual  analysis  in  terms  of 
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the  rate  of  rnspesdlss  to  the  bar,  the  results  are  analysed  hero  In  teras  of 
the  patterns  of  responses."  [p,  21,  19]  Three  experimental  phases  west* 
considered  separately  in  the  analysis  * a)  behavior  prior  to  conditioning 
(operant  IstcI),  b)  conditioning  h«h»viar,  and  e)  extinction  behavior » Boring 
phase  b a total  of  3C0  rsinfcreerBsnbs  was  applisdo 

la  all  phases  the  behavior  van  recorded  as  sequences  w A*s  sad 
B’s,  and  the  uncertainties  - In  tecma  of  the  index  of  behavioral  stereotypy*' 
seife  0C“ipn+-e-i-  Tt  was  found  that  1 int-ersywbol 1 influences  did  not  extend 
appreciably  beyond  two  synbols,  and  the  value  of  the  uncertainty  in  phase  a 
was  0«li.C8  for  tws  synhols.  Such  a high  value  when  there  has  boon  no 
conditioning  la  a consequence  of  the  fact  that  such  a sequence  as  AAAA  had  a 
probability  of  0.732  of  occurring  indeed,  the  behavior  of  the  rats  was 
wore  stereotyped  before  conditioning  than  after.  nTha  training-p eriod  did 
not  introduce  order  into  randomness,  but  rather  caused  the  animal  to  abandon 
ore  veil  organised  pattern  of  behavior'  for  another.  This  needs  some  quali- 
fication. The  lower  stereotypy  after  conditioning  appears  when  ??»  consider 
only  the  teuQporal  order;  when  we  try  to  predict  which  response  conns  next. 

If  vc  bried  to  predict  Iso  when  the  next  response  would  occur  and  how  long 
it  would  last,  then  the  conditioned  behavior  would  lock  less  random  than  the 
pre-conditioned  behavior.*  [p.  25,  193 

Another  simple  vrsy  the  data  may  be  described  is  as  points  in  & 
two-dimensional  plot  of  p(3jB)  yb  p(l|A).  In  phase  a of  the  experiment  tha 
rata  were  approximately  at  the  point  (0.9,  0«7> )»  This  high  perseveration  is, 
in  large  part,  simply  a reflection  of  the  topography  of  the  Skinner  bar,  ss 
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can  be  aee.i  from  the  fact  that  96  per  seat  of  thn  responses  separated  by 

less  than  10  seconds  are  of  the  fora  AA  and  BB,  while  this  is  reduced  to 

$ 2 per  cent  for  responses  separated  by  sKv^e  than  80  seconds* 

•y*c  c:  . k 
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rats  initially  move  down  tne  plot  and  tisa 
curve  slowly  error  to  an  equilibrium  point 
of  about  (0*1,  0*1}),  as  shown  in  figure  6. 

During  the  extinction  psmod  the 
of  a rat  in  this  space  is  not  very  dear. 
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Tb«w  appears  to  be  an  initial  tendency  toward  the  ranter  (0*5;  0.?)  of 
the  plot,  or  random  behavior,  but  there  is  considerable  rsndrsu  variation 
over  a large  portion  of  the  plot.  Over  a 36  hour  period  there  is  a drift 
toward  the  initial  resting  point,  but  no  stability  is  achieved  in  that 
period  ecnparsbls  to  that  prior  to  conditioning.  It  was  not  determinable 
fro®  this  data  how  long  it  takes  for  the  effects  of  redrf orceneni  to  wear 
off.  As  in  phase  a,  there  is  little  difference  b*tw«*en  the  uncertainty 
determined  from  two  successive  responses  and  fron  more  than  two,  and  after 
some  extinction  there  is  little  or  nc  difference  in  the  index  based  on  a 
single  response  anti  that  based  on  successive  pairs  of  responses. 

•The  data  presented  seed  analysed  [in  this  paper]  do  not  provide 
any  startling  new  insights  into  operant  conditioning.  Most  of  the  conclusions 
seem  perfectly  reasonable  and  obvious  to  anyone  who  has  worked  with  rats 
in  a similar  situation  and  observed  their  general  behavior  closely.  Its 
impressive  feature  of  such  an  analysis  ia  the  extent  to  -jhieh  the  qualitative 
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aspects  of  the  behavior  can  be  incorporated  ir.to  a completely  quantitative 
aeoGanto"  [p.  35?  19] 
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'%s  data  on  the  effects  of  sequential  deperadauuiiiis  on  jgtglligi- 
billty  are  less  detailed  fes?  but  tb*?e  1?  an  j^KjaaTt  bar 

Miller , Heir’"-  and  Idebten  £69]  in  which  certain  gross  effects  ware  oocsfsined. 
Thsy  explored  the  effects  of  three  different-  contsocts  on  iirielligihili'ty, 
namely:  the  test  item  is  known  to  be  one  of  a small  vocabulary  of  possible 
items*  the  test  item  is  irbeidad  in  either  a word  or  a sentence,  and  the 
test  item  is  known  to  be  a repetition  of  the  preceding  Iton;.  Tho  materials 
used  ware  digits,  words  in  seatra-ces,  and  nonsreasa  syllables,  and  it  was 
found  that  intelligibility  decreased  in  that  order.  Further*,  the  intelligi- 
bility of  aernoyllables,  isolatea  words,  and  words  in  sentanr.es  was  found 
to  increase  in  each  case  as  the  detrain  of  possible  items  was  decreased* 

<)n2y  a very  slight  increase  in  intelligibility  resulted  from  the  knowledge 
that  the- item  was  a repetition  of  the  preceding  one.  "The  results  indicate 
that  far  more  improvement  in  c«nnnmicatlon  is  possible  by  standardizing 
procedures  and  vocabulary  than  by  nercly  repeating  all  rressagas  cm©  or  +5»o 
times."  [p*  335,  69]  This  conclusion  seems  to  confirm  the  military 
practice  of  using  standardised  language  wb*sn  conditions  .adverser  j --s 

in.  air  traffic  control  (see  section  H,2.ii)o 
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8.U  Perception 

E-ke  and  %3»n  [31]  raised  the  question  of  Just  hew  Hell  and  ir. 
abaft  way  T»eopie  707085.70  sequential  dependencies  which  sre  built  into  a 
set  of  uiimuli,  wiu  tiej  chess  to  asswrlM  thair  results  in  terns  of  certain 
conditional  uncertainties  - entropies  - of  the  subject  responses..  Ths 
experiment  was  divided  into  four  senes  of  runs.  Sieh  run  consists  of 
2lfCi  presentations  of  one  or  the  ether  of  two  symbols  (H  and  ’/)»  and  those 
pxs»eutations  were  generated  according  to  the  following  probabilities  and 
conditional  probabilities* 


Secries 


•?(HjH)  .50  .80  .75  .90 

p(v)  .5o  »$o  .25  .25 

p(vjv)  .50  .,80  .25  .70 

?i-ior  to  cash  proccntr.t±"c3  the  subjects  were  * 'squire  5 to  predict^ 
w gu5sss  which  syefcol  would  oosiur#  ".he  problem  of  analysis  is  to  determine 
how  accurately  we  can  predict  nis  gw*sts  11=^1  Irvro  rertrir  ~rrt  r~*~*+* 
such  ;is  guesses  and  the  Sobols  which  actually  occurred.  For  the  last 
120  trials  the  following  conditional  entropies  were  tcramineds  the  entropy 
of  the  guess  7 when  only  the  diat-ributicn  of  7 1*3  known  -H(y),  the  entropy 
of  y whan  the  distribut.ion  cf  7 and  the  previous  guess  sre  known  -n  (y), 
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tle  wntarqpy  of  7 whan  the  distribution  of  7*  the  previous  gosss*  cad  the 
parerrtwa  ooeurreEca  are  tecsas  - h^(v).  the  entropy  of  7 when  the  distribu- 
tion of  7 and  the  previous  occurrence  are  known  - B^Cy),  and  the  analogues 
of  each  of  the  last  xhv&a  fux-  the  t*c  preceding  trials;  instead  of  .just 
sno.  These  data  arc  summarised: 


Series 


1 

1 

2 

3 

H(y) 

loOO 

1.CO 

.76 

.80 

Hy(y) 

1,00 

.83 

w?2 

.75 

Hx<-r) 

.99 

.69 

ol 

• fU 

.70 

V(y) 

070 

.51» 

068 

.55 

1.rW 

•»  0m 

1.00 

.83 

.72 

.73 

1W7i 

,98 

.55 

.70 

.56 

*wy~r)  i 

-95 

.52 

.66 

.55 

It  is  clear  that  the  best  prediction  of  the  subject's  guess*  i.e»,  the  lowest 
entropy*  is  obtained  'utian  both  hi®  guesses  and  the  actual  occurrences  on 
the  two  preceding  trials  are  known*  but  a knowledge  of  Ms  goes?)  and  the 
actsrl  occurrence  on  the  single  precwiing  trial  yields  a prediction  which 
is  nasrly  as  good*  end  knowledge  of  only  the  occurrence  op  the  two  preceding 
trials  is  only  sli^iitly  worse.  It  then  follow;  that  a subject  responds 
not  only  to  the  actual  events  which  occurred  but  also  to  his  preilctioxiii 
about  than.  This  can  be  made  quite  apparent  by  computing  the  probability 
of  a gaum  cf  H when  on  the  preceding  trial  a correct  guess  of  H was  mside.  Fo 
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series  one  this  condi.tiorr:l  probability  is  &bout  0*5,  but  fair  the  other  three 
varies  it-  rises  over  trials  sad  from  trial  100  on  it  remains  approximately 
constant  with  a value  of  0.9.  when  the  proo ability  of  an  H guess  following 
two  successive  correct  H guesses  is  plotted,  the  carves  rise  mars  rapidly, 
and  erven  in  aeries  one  there  is  a rise  from  0.5  to  about  0.75. 

«sj>  ggwTwja  from  oitr  Sridencs  that  Ss  do  sot.  in  f net  frarggira 
the  probability  rules  by  which  the  s cries  of  events  was  generated.  They 
do  perceivs,  instead,  those  short  sequences  or  events  which  precede  each 
prediction,  which  can  be  discriminated  from  other  possiole  sequences,  and 
which  «*r*  found  to  provide  scras  informal  Ion  about  the  future  babaviar  of  the 
symbol  series.  Th-are  are  several  interesting  conclusions  widch  we  can  make 
about  the  vay  in  which  Ss  percsiTo  these  previous  events* 

"1.  All  contoiuationa  of  possible  previous  events  ware  not  dis- 
criminated with  equal  ease.  Same  previous  events,  especially  homogeneous 
runs  of  the  same  symbol*  ware  got©  easily  discriminated  and  consistently 
responded  to  than  were  others. 

"2.  The  previous  events  to  which  our  Ss  responded  on  each  trinl 
Included  more  than  ,Juat  the  symbols  which  had  been  appearing.  They  Included 
also  tfc»i  previous  predictions  of  Ss  and  the  of  oarrsspondttnce  between 

their  predictiots  wad  the  syakols  which  appeared  on  previous  trials. 

»3.  There  was  considerable  agreement  emcing  our  Ss  as  to  when 
fk  particular  symbol  should  be  predicted,  Thsy  tsndtd  to  respond  to  scans 
similar  or  identical  previews  events  in  the  same  wy,  no  matter  which 
series  they  were  predicting,..1*  (p.  72s  51] 
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y.  Immediate  Recall  of  Sets  of  Thdapsadent  SeliKsticaa 

The  subject  of  this  section  is  closely  related  to  that  of  H.8«Ij 
it  was  not  Included  there  since  the  sain  euphssi  ? of  that.  section  wwo  on 
the  effects  of  intar-synibol  dependencies  on  immediate  recall,  whereas  bsre 
we  shall  examine  the  effects  of  urn  sage  length  and  the  bits/sysfcd  when 
there  are  no  dependencies  among  symbols  - rollauk  [77 3 prepared  luCDoCgCw 
of  from  U to  2k  symbols  from  sets  of  2,  1,  8,  16,  and  30  cqui  probable 
English,  consonants  and  nam^rals.  These  were  read  in  a uniform  manner  to 
subjects  who  were  told  in  advance  both  the  set  of'  symbols  and  the  message 
length,  and  they  were  required  to  reproduce  them  as  accurately  as  possible. 
When  an  error  was  avade,  the  subject  was  requested  to  gueec  as  many  times 
as  was  necessary  to  obtain  the  correct  response.  In  one  vorsiiru  of  the 
experiment,  reading  rates  were  -varied,  but  "Rate  of  presentation  of  stimulus 
materials  (over  the  range  considered)  appears  as  a variable  with  little 
significance  for  immediate  recall  under  the  conditions  considered  here." 

[p.  13,  77,  nl 

The  data  show  that  the  error  entropy  per  message  unit  increases 
both  with  message  length  and  with  an  increase  in  bite/symbol,  but  that  for 
& moB&ug?  of  gives  length  the  percentage  of  presented  information  which  is 
lost  is  approximately  independent  of  the  number  of  bits/symbol,  ibis 
percentage  is,  however,  an  increasing  function  of  the  length  of  the  message,, 
The  ei*ror  entropy  increased  in  such  manner  that  the  total  information 
transmitted  .Increased  as  the  message  length  was  increased  .from  It  symbols  to 
about  10,  it  remained  roughly  constant  in  tho  ranee  of  10  to  16  or  18 
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symbol a per  nessaga,  and  it  decreased  for  longer  res  sagos.  The  (Airves  are 
displaced  upward  vlth  an  increase  in  bits/syiuocl,  but  t hey  are  of  remarkably 
similar  shape-  ’‘fha  main  generalisation  is  that  one  cannot/  obtain  simul- 
taneously both  minimum  information  less  and  me-dmum  information  gain  by 
eiiroly  varying  either  the  length  of  a massage  or  the  number  of  possible 
alternatives  per  message-unit,"  "These  relations  st-sn  from  the  fact  ten*- 
the  percentage  of  the  informfitioa  presented  that  in  lost  or  gained  is  in- 
dependent of  the  number  of  alternatives  per  unit  and  is  simply  a function 
of  the  Xungth  of  ihy  messago,"  [p-  12,  77,  I] 

It  is  useful,  to  tMtasform  these  data  into  plots  of  error  entropy 
and  infonaation  transmitted  vs  total  informational,  input.  It  is  then  found 
that  fur  a fixed  input,  the  error  entropy  is  smaller  and  the  information 
transmitted  ±e  larger  the  larger  the  number  of  bits/ symbol*  Tires,  ss 
Pollack  pcintc  out,  if  cue  ia  interested  in.  the  optimal  encoding  character- 
istics for  messages  of  fixed  length,  thero  are  two  answers , depending  on 
whether  a high  error  count  is  tolerable  'JT  not.  If,  however,  the  question 
is  what  are  the  optimal  encoding  characteristics  vfor  imaediate  recall) 
for  r-®naages  of  fixed  informational  content,  then  the  answer  is  unequivocal: 
short  messages  with  a large  number  of  alternatives  for  each  message  unit. 

In  part?  TIT  and  T7  of  his  report.  Fcllack  systematically  studied 
the  error  behavior  of  his  subjects.  First, his  data  confirm  the  familiar 
finding:  of  this  type  of  experiment  that  the  subjects  ore  most  uncertain  about 
the  middle  portion  cf  the  message.  For  sassages  of  length  7,  the  relative 
uncertainty  of  the  jciddlo  symbols  is  slightly  higher  than  the  end  uncertainty. 
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but  It  never  exceeds  .30#  Hcw*rers  for  messages  cf  length  21*,  these  is  a 
broad  plateau  in  the  middle  of  tfso  message  which  has  a relative  unserfcainiT- 
of  about  .30,  The  broadness  of  this  plateau  Pollack  attributes  to  the 
£reat  sarsitivity  of  the  information  measure  to  errors.  He  setee  that 
this  uncertainty  curve  alters  its  character  with  increasing  message  length  * 
for  short  messages  it  •’a  positively  skewed  and  for  long  ones  it  is  nsgativsly 
skewed. 

In  tLa  fourth  part  of  the  report,  he  established  the  conclusion 
that  tber«  is  still  information  transmitted  (as  compared  with  chance  responses) 
by  the  subjects  on  the  second  and  third  guesses  following  an  incorrect 
response.  ’In  general,  the  additional  information  recovered  per  message 
Increases  as  the  degree  of  analysis  of  the  multiple  response  dats  becomes 
more  exhaustive.  Stated  otherwise,  we  recover  more  information  from  the 
distribution  of  responses  if  we  utilize  the  first  response  following  ths 
initial  incorrect  reproduction,  still  more  if  we  utilize  the  first  and 
second  responses  following  the  initial  incorrect  reproduction,  and  still 
more  if  wa  utilise  the  first  through  the  third  responses  following  the  initial 
incorrect  rSjjreduc tion - The  magnitude  of  its  information  recovered  increases 


l 

I 


i 


as  tiie  rrusber  of  alternatives  per  message-unit  increases  and  is,  roughly, 
independent  of  message-length  (for  messages  greater  than  7 units  in  length)," 
[p.  o,  77.,  17]  As  would  be  expt.  bed,  this  effect  is  a decreasing  one. 


but  the  decrease  is  less  rapid  with  larger  numbers  of  alternatives  per 
message-unit;-, 
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10,  Concept  Fontatlgg 

Let  there  be  eight  objects  which  are  triangles  or  circles,  large 
or  snail,  and  black  or  red.  We  nay  attempt  to  cozmy  a concept.?  such  as 
red  triangle,  to  a subject  by  showing  him  the  oV>jeeba  one  at  a time  and 
stating  whether  or  rot  they  are  examples  of  the  cLo  uILa  cd,  concept.  A positive 
iustsncs  of  the  concept  red  triangle  is  large  red  triangle,  whereas  small 
black  triangle  or  large  .red  circle  are  negative  instances.  Such  caperimentB 
in  concept  learning  have  long  been  performed,  and  the  conclusion  has  been 
drawn  that  negative  instances  are  of  little  value  in  learning  -.She  correct 
concept,  Hovland  {37],  however,  has  raised  a question  about  this  conclusion  - 
a question  which  stems  from  an  information  analysis  of  the  situation,, 

'What  Is  not  clear  .,,  is  whether  the  ineffectiveness  of  negative  imtcnces 
is  primarily  attributable  to  their  lor  value  as  carriers  of  information, 
or  whether  it  ia  primarily  due  to  the  cdrxiculty  of  sibilating  the  infomatis 
which  they  do  convey."  [p.  I«6l,  37] 

Certainly  it  is  clear  from  the  above  example  that  positive  sad 
negative  instances  do  not  transmit  tho  same  information,  since  only  two 
positive  ones  are  required  to  specify  the  concept,  ss  compared  with  sir 
negative.  It  is,  of  course,  possible  to  design  a situation  where  the 
negative  ;=  carry  s.s  much  o?  more  information  as  the  positive  ones. 

For  certain  simple  general  situations,,  of  which  the  above  example  is 
illustrative,  Hovland  has  given  foi—alae  for  the  total  number  of  positive 
and  negative  instances  required  to  specify  the  concept*  In  an  oxpericraii&u. 
paper,  he  and  Weira  [38]  examined  the  effect  of  positive  and  negative 
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instance  when  both  the  number  of  Instances  and  the  amount  of  informatics 
are  held  constant.,  and  they  conclude  that  erari  so  the  negative  instances  do 
not  contribute  ay  effectively  to  learnings  nAt  the  same  time  the  data 
disprove  the  generalisation  often  cited  that  negative  iu8w&uiud5  have  no  vsl us 
in  the  learning  of  concepts*  vnder  appropriate  conditions  ever  half  of  the 
Ss  ware  able  tc  i-eatu  the  correct  solution  solely  on  the  basis  of  negative 
instances*"  [p.  2.81,  ?8] 

fopdig  £1*3  conducted  an  experiment  which  is  closely  related  tc 
concept  formation,  namely,  the  identification  of  a concept  oft  or  the  manner 
of  the  game  ’20  questions »*  In  the  experiment,  four  questions  wore  enploysd 
to  isolate  an  animal  topic.  One  experimenter  asked  the  questions  in  fixed 
order  of  another-  who  answered  ’yes5  or  ‘no’  according  to  the  topic.  Follow- 
ing each  question,  tho  subjects  were  required  to  guess  the  concept,,,  foe 
information  transsdtted  by  each  question  was  calculated,  and  theoretically 
each  should  have  conveyed  one  bit,  but  in  actuality  0,03,  0.91,  0*21,  and 
0,7”  bits  were  transmitted.  The  central  conclusion  seemed  to  be  that  the 
third  question  wan  unfortunately  phrased,  since  aufjwars  to  it  failed  to 
convey  much  information, 

I!,  Paired  Associates  Lsar;iiag 

In  this  section,  vo  shall  consider  s learning  situation  where 
one  class  of  objects  - usually  wordr  - Ire*?"  ?«?  ‘responses’  have  been 
placed  in  one-to-one  correspondence  with  snotba-r  class  of  objects  known  C3 
’stimuli.’  Initially, the  subject  knows  nothing  of  the  pairing  and  he  can 
rwiTT  guess  at  the  approroriate  response  to  a giver,  stimulus?  if  he  is  correct. 


hs  is  told  this,  if  not,  he  is  told  the  correct  response*  After  a number 
of  repetitions?  R,  CX  the  stimulus  class,  the  subject  bogjuis  to  learn  lha 
correct  pairing,  and  he  obi-Alaa  a certain  number  of  correct  bonds,  say  C, 
out  of  the  total  of  N.  The  function  C(R)  is  known  as  hia  : learning  curve' 
for  the  paired  associates*  Several  theories,  and  formulae,  for  this  learning 
phenomenon  have  been  put  forth  which  aro  sum&rized  py  Rogers  i?k 3 in  a 
thesis  in  which  he  introduces  « new  learning  theory  based  in  part  on  infonfci- 
tion  theory* 

Ha  makes  tve  central,  agsunpticns.  First,  he  supposes  that  the 
\tncertaiufcy  which  a subj  6Cu  b«lo  with  respect  to  ths  atiisulus  class  «ft«p 
R repetition*  j of  the  stimulus  class  is  a function  of  R aic-ns*  In  particular, 
he  supposes  that  it  is  constant  - U ^ - for  the  first  t repetitions,  where 
b is  a fset*  .'tarameber  which  tells  vben  the  learning  begins,  and  that  fires 
b on  it  is  p.  linear  function  of  R,  i*.e„, 

« S/^  - AyR-b)  for  Ji  > b. 

Second,  let  B be  the  total  number  of  bonds  which  the  subject 
knows  after  P.  repetitions,  which  Rogers  shews  is  one  less  than,  the  expected 
vilue  of  the  observable  C*  Let  k be  a stimulus  not  among  the  B that  are 


known  and  let  i be  ary  xcSpOuaw  17 L 


u - A ~ 


nv 


ted  with  one  of  the  B 


known  stimuli,  then  fcs  supposes  that  the  probability  that  i is  the  response 
when  k is  given  is  l/(H-3)*  In  other  words,  the  subject  is  assumed  to 
distribute  Ms  3-espcnse  choices  without  preference  o vex'  all  the  available 
response  eleraerr'r-a : 

From  this  second  assumption,  it  is  not  difficult  to  obtain  an 


r 
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expression  for  tbs  uncertainty  in.  terr.-s  of  N and  B.  £kpi»tii2g  this  to  the 
assumed  expression  in  terms  of  R gives  an  equation  between  5 at-d  R,  and  at; 
between  C ana  R»  This  may  be  solved  for  Ct 


(N-l)  [1  - e“°a{s‘^>  1,  for  R > fc 


for  R < b 


where  5 ■ log^e*  It  lias  Icig  bean  noted  that-  many  learning  data,  are 
approximately  fit  by  such  an  exponential  learning  curve,  though  in  general 
this  iuis  been  an  «ig?.irical  oh nervation  which  was  not  deduced  from  other 
assumptions. 

To  test  the  merits  of  this  theory,  Rogers?  drew  certain  conclusions 
from  it  which  could  be  confronted  by  data,  and  these  conclusions  were 
dbEtaineO  ay  nis  oa i . a ^ ±r* r<m  xui^uvw  v*.  « & -s-^ro  pestfomiuu# 

1)  Correlated  Structure,  Stimuli  - playing  cards  having  two  easily  r-eco g- 
»t«ad  disiensiona,  suits  and  denominations,  were  associated  with  nonsense 
ijyllafclas  of  the  form  consonant-vowel'censccant  in  a correlated  Banner* 

The  first  letter  always  corresponded  to  the  denomination  and  the  last  to 
the  3uit,  2)  Unstructurado  Pictures  of  divers?,  household  objects  per© 
paired  in  «a  arbitrary  manner  with  nonsense  syllables,.  3)  uhecorwusted. 


oaruccurea,  mb  uamo  irawjx'iojj 


sa  in.  1 were  used  (so  both  the  stiSKilua 


class  and  the  response  class  were  structured)  but  thes®  was  no  systematic 
relation  in  the  pairing  between  the  «rtiimilus  class  er.c  the  -response  c1rss» 
He  then  ©jeamined  vfcat  two  classical,  tl  isoides  of  learning  - Gectalt  and  the 
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transfer  theory  of  meaning  - and  his  own  inforraat-Jjon  theory  oz  learning 
predicted  as  to  the  learning  rates  Li  these  three  cases.  Gestalt  theory, 
according  to  Ms  interpretation,  ranks  tharo  1,  3-  2 in  nrder  of  increasing 
difficulty,  transfer  theory  gives  an  ordering  of  1,  2,  3,  while  inic'rmstion 
theory  predicts  that  X and  2 should  be  equally  easy  and  3 mos,s  eifficull* 
tv  — only  the  last  predictions 

Attempts  to  fit  the  learning  curve  to  the  data  were  for  the  rar.-srt 
part  successful,  although  one  can  note  a consistent  *S*  character  to  the 
data,  which,  of  course,  the  ..XpOneritial  does  not  possess.  He  points  out 
that  if  the  line  ar  assumption  were  replaced  by  an  appropriate  non-linear  one, 
once  could  easily  produce  a learning  cxsrvs  with  an  *S»  shape  - or,  we  adract 
add.  with  practically  ary  ether  shepn,  for  that  matter. 


Appendix:  The  Contigaous  Theory 

Much  communication  can  best  bo  thought  of  as  idle  transmission 
of  a continuous  signal  and  not  as  a sequence  of  temporally  ordered  selections 
from  a finite  set  of  possible  elements o For  the  most  part;,  as  we  have  seen, 
the  contiguous  theory  aB5  u£Du  of  little  isrportance  in  behavioral  applica- 
tions, though  it  is  of  considerable  importance  in  electrical  ones.  We  shall, 
thsrof  c*v  c , only  sketch  the  theory  briefly ^ Ora*  presentation  follows  Shannon's 
f67j  closely,, 


Aol  The  Continuous  Sourrg 

A source  is  aaid  to  be  continuous  if,  la  affect,  it  makes  but 
one  selection  from  a continuum  of  elements  j specifically,  if  it  chooses 
one  number  from  the  set  of  all  real  numbers ,,  We  shall,  suppose  that  this 
selection  is  characteririi  by  the  probability  distribution  p(x)  over  the 

n 

real  numbers  z.  Since  p is  a distribution,  j p(x)dx  ” 1,  and  furthermore 

»¥ 

for  any  s > C„  no  rustier  bow  small,  we  can  find  finite  a and  b mich  that 


i 

1 - C < t(  p{x)dX  < la 


Uaa.  for  such  a and  fe  we  may  divide  the  interval 


.from  a to  b into  n equal  intervals,  and  we  can  treat  each  of  the  intervals  as 


an  element  from  a finite  set,  with  probability  I p(x)dz  of  being  selected 


All  ii:o  continuum  not  in  a tc  b is  aa  s + 1st  ‘b-Twnt  with  crobability 
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1 - ^ p(x)dbc0  Thus  mb  have  apprcaein&ted  the  continuous  source  by  & discrete 
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one  and  fear'  each  n we  ear;  compute  a eorrssponding  entropy  H^c.  As  we  let  n 
approach  Infinity,  the  approxboation  is  better  and  better,  but  unfortunately 
Hb  also  approaches  :infiniiy„  This,  of  course,  is  reasonable  considering 

the  basis  of  the  rii.ueret?  rnT.wyntr  nnnrev.t-.  hut.  that.  fl»5  sot-  jmStc;  tho 
approach  arxv  more  o*t?.sfaetcrT  as  a way  to  continuous  sources „ 
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In  such  situatiors  it  is  very  often  the  esse  that,  the  difference 
between  the  quantity  desired  and  another  quantity  which  tends  to  infinity 
with  s will  itself  tend  to  a finite  limit.  If  this  second  quantity  can  fcs 
chosen  to  be  the  same  for  *11.  sources,  then  the  resulting  differeneo3  are 
per— soi-ey  arauptai— e c.o*« .par atcur^  for  the  eoutl**uouS  source.  Ao  t of ore , 
w&  choose  a and  b and  ire  divide  the  interval  from  a to  b into  n equal 
intervals.  Each  of  these  intervals  i it  of  length  Ax  • (b~a)/r.0  Wh press 
before  vo  tried  to  gene  "“aline 


“ 2 p(z,  ).tx  log9  p(xi)  AX 


and  got  into  trouble,  we  new  exaaine 


xx 

logg  4x  - 2 p(x,  ) Alogg  p(x4) 


It  is  not  difficult  to  rihca;  that 


iim  lim 
b-f  »■?  n-y 
a-toc  AX4o 
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A* 

- 2 p(z,  ) AX  leg,  p(x,.  ) o.xj 

i - 1 1 x 


1 ... 


ism* 


/ p(x)  logg  p(x)« 
J-**0 


This  quantity.  whi-sh  la  denoted  E(x),  is  called  the  entropy  of  a conMnu cud 
source.  It  is  veil  to  keep  in  mind  that  the  aontin^ou*;  enwopy  is  not, 
an  exact  analogue  of.  the  discrete  instropy;  and  so  certain  differences  in 
properties  may  be  axpectedo  The  rurprising  thing  is  Los?  met ™ of  the  rsaults 
ars  independent  of  the  bapa-iine  from  which  the  discrete  entropy  in  measures 
If  there  are  two  argusent?  x and  y to  t 'txi  distribution  (as  in  the 
case  of  noise),  the  Joint  end  conditional  entropies  are  defined  bry- 
uf ' • _ r r -.-t-v  *>«„  «?-»- 


H(x$y}  ~ - 1 1 m(j  «»  ) <>§2  j 
^(r)  --  //p\x,y)  logg  djjdy 

Hy(r)  » - / / pf'=*y)  logg 


woer# 

mmti  M t •*  M r 

h V-*  / * fi-zw 

p(y)  “ / p(:c,y)d;tc 

dany  of  the  theorems  of  the  discrete  case  carry  over  • u3UBJly 
quite  dirjctly  - to  the  continuous  case,  but  in  addition  there  are  certain 
now  theoj.*egg  which  re»>t  heavily  on  the  existence  of  a coordiam.-e  qyst®n« 
We  list  sesae  of  the  more  iaportant  ones,  of  which  the  first  is  familiar 

ftfUiL  *itl£  JL CU2P  «12TC  *10*?  i 


lo  a (x,y)  < H(:c)  + H(y), 


3 y r tar 


- ^TTitTr^-fTm^n-wn.^-n'nil — irTrrtTt—|-TrmnTn>tt;i?r~ir-i — in  ;-  , 


mxjv ) - n(x)  - ax(7)  » H{y)  * H (x), 

Hx(y)  < H(y)0 

2s  If  p(r)  =*  0 except  cm  an  interval  of  length  v,  then  H(x)  Is 
a maximum  ( **  logg  r)  whan,  p(it)  * l/v  for  x In  the  interval,. 


C;f  the  class  or  nil  continuous  one-dimensional  distributions 


with  variance  cr  * the  n»  rad*  or  Cvu33ian*  ia  the  one  having  maximum 


entropy,,  The  value  of  the  maximum  is  log,  2(n)  '*  f'or0 


h<>  Of  the  class  of  all  continuous  one-dimensional  dintribuilofns 


with  mean  a > 0 and  with  p(x)  0 for  x < C,  tha  aoqxmsntraj.  is  the  one 


having,  msaAroza  entropy.  T’m  value  of  the  maximum  Is  lof^  ea< 


So  unlike  iliti  discrete  cess,  in  which  entropy  ensures  the 


randomness  fin  an  absolute  -way,  the  cow-inuous  entropy  is  a measure  which 


is  relAtiT?  to  a coorcUn'.'  o aystem0  If  the  coordinate  ttysrfcam  in  changed* 


the  entropy  is  changed.  lh±o  is  not  serious * however,  since  both  the  channel 


capacity  and  the  rate  of  Infcrastica  transfer  depend  on  the  diffansne*? 


of  two  entropies*  and  so  they  are  invariant  under  coordinate  transformation o 


Reiah  [83]  states  that  he  has  shown  that  the  definition  of  information 


rate  used  by  Shannon  is  the  only  oi«?  of  a broad  class  of  possible  definitions 


which  is  iuvarianrt  under  coordinate  transforssE’.tiorxo 


Ac 2 The  Channel  Capacity 


As  In  the  discrete  noisy  case*  the  channel  capacity  C is  defined 


to  be  tha  aacdbraica  rate  of  transmission  ft  « 15{x)  Iy(x)  obtained  by  considori'ig 


■ 1 ■ 
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«U  possibles  distributions » This  la  ea/jlly  shown  to  be 


C » ltn  « ? //p(*,y)  isgg  ^ 

?-#«*.  p{x)  x Pvx^pvy. 

One  jsnrtlisilarly  important  case  in  applications  is  that  in  which  iho  noise 
Is  simply  added  to  the  siga»l  and  is  inde^idant  of  itc  In  that  ^ase  the 
««iropj  ul'  uai  noise  can  fcs  ccsputod,.  If  vc  dsnsts  it  by  Il(n);  then 


G ■ max  H(y)  •*■•  H(n). 


Of  coursey  if  tharc  are  rss'Swtwts  on  the  class  of  admissible  signals,, 
the  HHxrjnlsation  is  taken  subject  to  these  restraints « 

A simple,  but  very  important,  electrical  application  af  tn® 
above  theorem  is  to  the  cs^e  of  a channel  which  has  a bandwidth  of  W 
syvdss  par  second  (®ogop  ».  i^l^nhuae  which  will  pass  firem  yJO  to  1.^00 
cycles  per  aeoond  nag  a bandwidth  of  3,000  cycles  per  second),  in  which 
the  transmitter  has  an  average  poser  output  of  P and  the  noise  is  white 
thermal  noJse(i.s.i;  all  frequencies  ara  equally  represented)  of  amrag s 
power  H.v  In  tils  er.se  the  channel  capacity  in  bits  per  second  is 


0 - W logg  (1  + yt)<> 


4 .1  Pate  nt  Tmnsuiisfiion 


■In  the  case  of  a discrete  source  of  information  vs  were  *>ble  to 
determine  a definite  rete  of  generating  information*  namely  the  entropy  of 
the  underlying  otcchastic  process <•  with  n continuous  source  the  situation 


•»** 


. ..  .-- — -rifrfi^TT  ^-WrfVWk-’r  ,li?fTi«Ti°*,-7  ■nHShOTOllil  <S— I j 


£ 


**  I 

« i 

W ) 

* 

£ 


i 

t 

i 

'; 

s 

i 


€ 


-130- 


la  c<"  iclderably  jura's  involved*  La  the  first  place  & eoirtinuossly  v*u.rLabX« 
quantity  can  >t*sLuna  an  5jifiaite  ™mbor  of  values  and  requires,  t-berofar-, 
an  infinite  number  o?  binary  digits  for  exact  specification#  Ibis  meres  that 
to  transmit  the  outjnrt-  c.f  a crnrfcinuous  source  with  grant  recovery  at  the 
receiving  point  reqaires,  in  gew<ral;  a channel  of  infinite  capacity  (in 
bits  per  sseord } . Siree-  orHi *w*rily#  channels  bars  «“  certain 
noise,  *nd  ilwrefarn  a finite  capacity,  exBct  transmission  is  iswcasiblco 

"TMa.  horiwvEr,  evades  the  real  Issue-  Practical ly,  we  are  not 
ixiier&ated  in  exact  iracsrisoion  whan  we  have  a csntimwras  source,  but  only 
in  trsunesdssion  to  witfcir.  a certain  tolerance # The  question  is,  can  we 
assign  » oaf ini te  rate  to  a continuous  source  when  we  require  only  a certain 
fidelity  of  recorrory,  asraured  in  a suitable  vayi  Of  course,  as  the  fidelity 
requlremsnts  are  increased  the  jrate  vili  increase  « It  will  be  shown  that 
we  can,  in  vary  general  »bb s,  define  s.  rate,  living  the  property  that  it 
Is  possible,  by  properly  encoding  the  Information.,  to  transmit  it  ovtn 
a rnawnc.1  whose  capacity  I?  equ*’*  to  tb;>  rate  in  question-  and  satisfy  the 
fidelity  rfcKplireoents ..  A channel  of  anviHes-  capacity  .in  insufficient -B 

I r>  . V ■:  5(51 

- * T.»  ^ ^ a 

The  noise  chsracbr  of  the  whole  syvr&air.  is,,  «b  bc-fc.re>-  given 
by  a distribution  p(z,y)  which  states  the  distribution  that  the  sijpsl  y 
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is  rneoived  whan  x in  sent#  Tbs  .f 


mu*  rs-f*  tl'A. 
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ervaluation  of  how  different  y is  on  the  average  fresn  x0  It  is  assumed  to 
be  a function  of  the  noise,  that  Is,  if  it  5a  jisessured  by  a 3*eal  number  it 
can  be  witter,  in  the  form  v(p(x,y ) )#  under  quite  broad  conditions,  which 


eitSd&am 


T*s  chell  not  attempt  to  state  here  (sea  I.B?j),  it  can  be  shown  that  v 


?S3  bs  r^resouMu  us 


np(x?y))  e //p(x,y)  p(x;,y)  cbt  dy-. 

Tha  xmc-vaj ”«d  function  o(r,?)  la  oaser.fc  tally  a measure  of  the  difference 

2*  etyiH  y gnj  4r>  l.na  .tldsll^1/  IT*  4.5  oriT^in  A to 

thf/  nrdbabjjJdy  density  of  the  .Isint  occurrence  of  x and  y»  It  may  bo 
■niTTi-riii-r  lng  to  consider  t*x>  Tory  cauaon  electrical  criteria,  of  fidelity., 

ilffli  ilrBo  1*1  ths  njo^n-  *□  ^uoTS  vrit-^riCuj  ? 


p(x,y)  *■  | / 1 [x(t)  “ y(t)]2  dt, 

and  the  esttomd  is  the  absolute  error  criterion,  iuuaely, 

T 

p(x,y)  K h /0  |x(t)  - y(t)i'dt» 

Now,  the  rate  i>  of  perorating  inf arraa  tier.  canrespcndirif'  to  a given 
quality  of  reproduction  (.fidelity)  t ia  defined  to  be  bne  mlniwun  3 which 
is  obtained  by  varying  pCy'i)  "ita  v held  constant.  i«eai 

TJ  rj  rs-t'-i  / A 7>f?r  y)  log  — dr  dy 

iTy'ix)  *■  * 2 1>‘3r; 

subject  to 

v - /y*p(x,y)  p(x,y)  ds  dy„ 

with  tc5-s  definition,  and  r<i»h  that  of  channel  capacity  given  ia  Hosts  on 
A,.?,  it  Can  1)0  sham  that  if  a sorrofi  has  a rate  R for  a valuation  cf 
fidelity  v,  tfcsii  it  is  possible  to  encode  the  output  of  the  sourca  and  te 


£r.£> 


! 


i 


i 

i 


L 


i 


j 


i 

\ 

i 


*S  VJ*. 


«4*31iiiAS: +i*m 


*fMfo 

JSP, 


. -as.*c »m  I m i h ' ' -■^iierrsmwMTir-'  tOtm*t0£ 


i 

* 

i 

t 

I 


g*' 


•1SS- 


Bibllcgrapfo 


:*• 

!'K 

l 

Ur 


* 

v 


Tba  fflUoaing  groan  of  papers  ard  boc&s  'dbish  vsrs 

iwraiawi  in  tbs  preparation  of  thixi  rsaerfe  indlmto  the  central 
verier;  on  the  theory  of  infar&ahiosi  and  all  of  the  varies  which 
k&  «=*»  l «a»  aifia  t*  find  (as  s£  serly  J^5?4/  caseac-asd  «S*a 
Its  fwitlagMCT  to  pOTsdwLoHSV  The  bi  aliagrapbor  prepared  by 
3teaj»s®  [559  5^]  is  ®ss?8  ganersil  than  curs  in  l&at  it  r-wvars 
the  thaLo  aces  ;*f  Cyliemeti'^a  and  iwe  a ppilic&ti cc"  s?  inrsr*®* 
Tiea  tasosy  in  weyrffiaering  «ab  in  the  esTOusil  behavri.canal  mleaees 
(aa  of  early  1555) » but  it  is  net  so  ccspiieie  as  oo^s  f«r  pay- 
chc&s&icsX  «4jpl3.raitior3« 


A 

e 

1 


f 


TV-WV&'Ss'l-s. 


■ -r 

» 


-ss  -V  '"y'  H'-T  ■ •«*>»  >■—  « 


-134- 


Aborn,  ri.  und  Itofeeastein,  H.s  “Iniersatiiai  Theory  eni  Ibresdiate  RaciS,!** 
Jgqmal  of  EperiyiM  Paychoiogy.  hh*  19525  260=266, 

Bell,  D.  At;  “Sm  1 Internal  Infes’setlaa'  of  English  "Wards,3  (fopraAggilwa 
Th-iurr  (eda  Willis  Jaekacn)  Acaosaic  Tress,  Iac«-  1953, 

3<G-ivX. 

Benf'lgj  A«  U.  end  fhv«»a8  J.  E.,  "Effect  cT  As  wart  of  Verbal  Ancfccr&tg  and 
ihmcer  of  ?-4sting»  S cclo  Categories  Upon  Tranaodtied  Xof OTSstlcn 5 a 
haemal,  of  Ssp-riaantel  Pgreholomrg  U6.  1951,  6?-90. 

bandig,  A*  W«,  a Twenty  Questions*  An  Infcrsarcdnn  *»uuyaie,e  Jtreraai  «T 

t‘— r.'J  *\vrrr  KA.  *?  ^ 


HtaebMn*  N«  K*#  Hh±nisrat-Gost  Seceding  a£  Information**  SrsagagtAmg  of 
^eje  Tnstdwt®  cf  iiside  Engineers  rroronoionai  Group  ua 

UteiSMqii,  H.  sod  Bcr-Hillfa]  , V,9  An  Outlina  of  ft  Theory  eg  samat&o ^iycrna^c 
Research  Lsuiorntoipy  of  E3*<rtaK£iictf  TeoSdLcal  ftepart*  STf^ 

ChnrrT,  E,  C.,  “A  History'  of  the  Theory  of  Xixfonaatica;,"  Proceedings  or  i its 
Isstitutlan  of  Sjectaribeal  ErndLcegre^  321,  ?6,  38JH3S3* 


or  ins 


, nA  Hisfesrj  of  tie  Thecay  of  Infsnaatlon,"  Transactions  of  the  Xnatitete 
*"  of  BMio  EuftlrBery^  Professional  Qrtrja  on  lai’ormttiCTi  Theory ft  1, 

IBBJTSSSr 

VuTlqvj^;;  Lq  W«  and.  Uaso-,  5?^  D».  S?~£33*iog?  for  the  AnalyiTla  ot  Haaction 
Tiratg  and  flfmLo  CiwS.ee  Behavior . dittoed  paper.  iS*53» 

U^S3&aan,  IS.  R.-»  F.  "iinKr^y  add  wholes  Ti«s?  i2»  Effect-  of  Fr^iiinear 
Unbalanoo  ©a  Chai.ec:  Tespccs*}"  Quarierly  Journal  of  Eap&rteaatsil 

,SS2D?j6sBIp  *j1”$2« 

Dewy,  Q,,  Halafclvgi  Tr^gunncy  of  axy&iah  Speech  Souaaa.  Harvard  University 
Press,  'vasS3?£B»"lS?3^ 

Uelsns?^,  I Adjagsr^nsa  T;laj«Jry5  K*.Pia  Table  cf  lez  2 **5-»  pilose  2 -w^.t 

n—*  *>!, .>.  O _ 1 ’,,  T . - * * . - 

— — * |*~t.  2 ~p“  • ^ iwaauicaj.  KepoiTS  ^2;,  isosrSecCS 

Labaratuny  cf  SLeotvonics,  M »!.¥,,  1952* 

isiiatt-  retaj?}  n;i  Hot«  oti  Atrtocorr?Jla*;icia  and  Bstropy,**  ProceBdlns  of  yu£ 
I'nstltAte  <f  R^dlo  Engine erti,  3?,  i;951,  639* 

I’ano,  it.  Mas  fbo  T’raiinsi'ioiec.  i£  IniVrrri^tiuB^  Eesoaroh  Libor.-rtt<iy  of  Elactrojii'ei'' 
'i^obzd.cai  iVS^T" 


UfclV'i  W «L|  «-«*■  '*  - •" 


* 


'^tfsSUS 


■ txXHUC' 9% 


-125- 


35» • ^ratgpalattlaa  of  InToggatioa  - TL%  Reeeereh  lAbwate!?  e? 

" " l?^i«baxt’Fi  Rop^  1950* 

16.  s ~Tte  Iaforraatloc  Theory  Pci«fc  of  View  In  S pencil  u^srssiicM cloa. ” 
./■oomal  of  ife;  Aooagt&oal  Seglety  of  tewi(Mi  22,  1950-  691*696. 


17  o a Infestation  11s 

&.r¥r:i53rr — 


ratvae,  aittosd  paper. 


18  Ptel-ton*,  h.  ¥.,  Ihrita,  E*,  and  'Irier,  G,  Wo,  Jr. , Coannniofttion  jggs^ragantc 

at  tie  Langley  Air  Feren  Itaae,  Human  negcoroee  ReaeSroh  Laboratory  Report 

isrmrSssT- 

15*.-  ??’<*;  r..  c.  and  MUler,  0.  A..  "A.  Statirt.iwaL  IVaSv*i“ hire  ef*  Ccr-ditiOfv> 

lugs"  AaegiCTn.  Jcurtsl  of  P^ftsiogy,  6b,  19$L,  20«;j6, 

20.  Friek,  F.  0,  and  Sassby,  VI.  H®,  "Control  Towa?  Language,*1  Jsaraal  of  the  AggogH 

tieal  Soggnty  or  Agarics,  2U,  1952.  595*.5??. 

21.  Gator,  D.,  "Theory  rf  C<y«Tanf c&tlsa  , " J o^traa!  of  the  Institution  of  SLactrical 

Er^lntwrca  93$  Hl3  191*6,  1*29=4*567 

22.  g "Sew  Possibilities  in  Speech  Trawniiaaion,*  Journal  sf  tfcs  I&Btitutd.on 

of  Electrical  Eggjaserg,  9h3  XU,  192*7 5 3S5“3?<£  — - 

23.  . iBgjargti^oo  Cgj^«ica<d.on  Thoorg,  Research  Labwetcnr/  «?  Electronic® 

2U.  a "anspjnriLcatloa  Thaoryj,  Pest,  Present,  aa«  Prtatopeoi^se,**  TraneagtlCBS 
~«f  tJ*s  Institute  of  Radio  Engineers,  ProffcssionaL  Cirsup  ajTlni oraoSbc 

Ksssrsrsyssrsf*.' — 

25. . “A  Sttaosay  ©?  Cannaad.catic'i  Tnsary, " CgngumetAcn  theory  (e>iU  Itiilie 

^aekress};.  Aoadeadc  Press,  Sen  fo^SSSrE^ST' 

26-. - 'C'j=TiS2*ie»ti^»  “itmory  and  Physics. 71  '^ran^astions  rf  tie  Instltete  af 

~*^ag io  EpglasegSs  i*rcXengionaX  urcup  <fa  .tnlr^nticn  i&esy“YJ”?5^3;  2*3^55 « 

2?.  Ga2T*32.5s  W*  E®  ara  Kals.  2u  K.»  "3s  iTioatst  of  Infonsaticn  in  Absolute  Jadgcgtiifeg-e* 
Psy^o!  gylg*l  RarzAerjs,  58;  1951*  !*i*6-J*5?:. 

28.  Gamer,  Wo  R.~  "itc  Inforo&tienal  Analysis  of  Ab®;£Ltrte  Juri^wnt®  of  Loudnsss,* 

jQiimal  o£  SAijwgiasa'yal  Psrr.hslcgy5  2*6*  19557,  375=380: 

29.  Gdldnau.  So,  iBibgaatjCT*  Thaor.y-i  Prentloi-'Halli,  \Av.r  'fork,  1953* 

30.  Hales,  R®  W«  R*»i  (tamer,  W.  R«,  r-i.*  Effsci  cut  rjr.smiulnng  Various  «** 

l<l5crstc  Steps  Gosi-r:  Fsadias  Accuracy,"  oi  S^^^tr^nteu  Fgrch&L- 

og?;  4*^  ./.k5j.#  353-366. 


« VTOOSS^jS^S^iisss 


■»1S6- 


SuJae.  a,  ’**  msm  J^S8 ii3  S*,  ‘'ParoEction  of  the  Statistical  $t«aofaxce  of  a 
Banda:  Saris*.  of  Cinsry  SsaboiB**' 

16,  1953*  61i=7li. 

Hslss/p  Ho  M*.  and  Chapanlsj  A.,  "On  tbo  JT^&gr  or  AJisnlntely  Xdentifisbin 
Spectral  Ifaaa,"  ,Voorsal  of  tig  Optical  ScdLsty  j5u:  Ansrica,  !jl,  i5$L, 
105T-1058.  '*'■*'  - '•*'• 

Hj.filay*  R»  V.  L.»  a StsssbI zzl'.-z.  of  Inf5sna«don3’i  Bell  System  Tochmoau. 

■Jmryn^.,  7,  192?.  535-563.  ~ ^ 

Hlc5r3  Wi  E«n  *Cr  th«  Hate  «f  Qcia  of  Inraraatioo,"  Onarterly^J onragl  qf 
Eaparteental  Psychology,  lta  3.952,  H«g6. 

aS.an^  LSmux. * « ■ 1 «i  i r » — — x ^ J» ^ kJl  a»a>«%  — *fT^»» — A»at 

u | n my  vmv  tiuaau  w^jtavut  i vi  mho  uyiito^y  m.  jjuq  wniwuir^ 

”iacnncjLogy>  ii,  1952,  67^77.  t 

«.  "Ira'orEWticn  Try?oy  in  Psytsibtilogf,*  rrasejotiroa  of  t&e  institute  of 
”~~~RauAO  Saglaaera.  Professione!  Croup  cn '^fr’^rmSon  fSsrxry,  1;,  ”5^537TiCm33 . 

HotfLands  C*  I,,  "A  tCurammlcatian  A'jslysis'  of  Concept  Learrdoii,*  Paycholcg^sal 

CD  TOffo  —S*^  ••  T"  «"  >• 

**F?Z£F*Si3B . 

Hc«3and,  C„  I*  and  V„9  "Iranasdciiixn  of  Information  Caooareing  Coo«- 

septa  t’brougti  Poeitiro  and  negative  In*tnnoesa"  Journal  of  EapasdneataJ. 
FgyghcOogyB  16s  1953,  175-18?, 

Rowss,  D»  Hs,  The  Pefinlli.cn  and  ito&yjggaBPt  of  Word  PrateeM3i.tg:»  Fiud*  Reals# 
Harvard  TTn'jL'Vfjrsl't^j  l#5o. 

Howes,  D<»  H*  and  Salomon,  3^  L«f  "Visual  5” ratify  Thraahniu  as  a Fssctios  e£ 

Werd  rw^ab$lAtjj'’,n  Jtxgxau  of  Eaparitregcs-i  Psychology,  Ills  X95X,  LCEWfljO. 

/ 

vtr^srv  H«s  fiSw«)ibs  Informatics*  as  a Dataradraict  of  Reaction 
Journal  of  Err-orinental  Psyghdoffirn  *$.,  1953,  188"1S‘6. 

vaekgcgs  Willis  (Edits??) s "Report  of  Preoas^'Jn«»os  Synposius  caa  infon* atdcai 
Tbacarya  Leaden#  I950jwJ^ssagt5cgi  of  Va  Institute  of  Radio  Eoggnggrg 
Profasai  oral  Qroup  m Ihraraation  The cry  9 i,  j.y«>ju  * 

_C  Coagonlcation  Tbs»w.t  Acai-wisj  Press  Inc*,  Nsv  Turk,  1953« 

Jitsdbsoix,  H«,«  "The*  Inforsaticoal  cspaod^  of  the  Hra»n  Ear,'  Simonas,  112- 
i95t),U>aiiii»  — — 

* "Ic^fOfiw.rdOQ  and  tfee  Rr5»=n  Tar."  -iccrmi  of  ti»  A^onetioai  Socdsts 
~~  ar  Awegtea,  ?3S  1«5L,  163-^71-  “ 

^ ~ lufoKj^tiOEsX  Capatiitj  of  ibe  Hwibii  Sod.’ancs,  H3»  195^,  2y2«2y3. 


i 

I 


-157- 


fe?o  Sis^^Ui^aa,  Patriots  and  Jackins,  J»  «■;  ? Irani  Sos'atlsp.  Thrsahold  «»  a 
rtasstion  of  ~*arri  ggsgusasn  a 3a»TJU«ri35a«  TSSULtSjs'cf  Laagmss  is 
BaBirLcr,  Svcarasax  Report  NnSer  5,  gsavgrclfy  «f  Slaaao^ 
n5c-  S3  003-66216. 

U8»  CLwswp,  E«  1*0  and  Frick,  r*  C«  ‘UssiEflaticn  of  Irifeengiioa  iCrwa  Dot  ••am, 

Hatrix  Ptt  isanu,"  //carnal  of  Sggwdg^gl  PgrcfaoilogTs  U5*  1953.> 

)*9o  Kiwaaer,  E«  ?•  and  Muller s P0  F»,  J;rs,  Tha  Rate  d Handlir^g  Infwiqafcirat 

Key  Pgciaf'.iog  n&Tpoaaaas  to  Light  Patt7<5^%,  garo^i'>m:T~a 0^347  1 S53« 

bOo  KraLea,  0/»  Ko  and  31nc3jdT|  E.  %?,,  "oww  aVi^*,.*?  --—  <* vj. i.*— 

»ti.an  xbBurr»“  Report  1,11?,  Ss>.sl  Rssesrch  liisarsiesjy,  l^ghijgfcoea  D-C.. 
195>3*  11  pp« 

51  o EiiUbackj  S«  and  Udblfir,  Ro  A»,  "On  Infonuatien  and  Sufficiency,*  Amain  of 
Ks^iaBatisal  Statistics;,  5{2a  1.951  s 72r^3$»  "" 

52o  g\jm>ad£s  So,  *An  AppiUsaulwix  of  lifffiKasiaa  Ejeesy  Kultiv»ria&e  iLrsl?s±Be* 
Anaga  af  Htthemtical  statistics  Z3*  1952,  oo«lC£o 

550  Li«HJjuwr9  JoCo  n0  and  Mills  r,  Q*  A*,  Hrsm  rnre&ptica  <s?  2nea<ibrtt  Handbook  of 
S»**?i**3tsl  P“”-±olo£?  (s«  s.  S'Uj^Bw,  oditsrJs  John  Vdlay  and  .bora, 

mi;ioifisrout — 

i?4«,  UtoLny,  Qc  K-,  "The  nasouclafn^e  cf  InfrroatcLon  Theory,  " tybgra«tiB«  (®d«  Hsies 
von  Fasrster)  Jaalah  Kasy,  Ji'a  Fouadatla^,  tteu  York/  ^3^“2S!^33 1 sad 
yragractiona  of  &a  Imtltcto  «t  Rsdi o Kngjnaerre,  Professional  Groce  on 

Twftw— Wan  •~<Ho<r-y  1 . 1<i>v<.  " " 

55,  i RQaaata2.  Asps ciz  of  Scientific  Infsaswiion*"  Flixyoophlcsl  Magaaiaa 
(aeries  7)*  hJ,  1950,  2H.9-511*  add  7^£3S£5ticaa  <5"tEo Institute  cf 
■i&uiv  Engineers.,  Professional  lirt»»p  on  illijfSSSittt  iKebicy,  I,  l?5ip  &>~8 0= 

560  t "In  Scax-ch  of  Basic  S^ribcls,"  (frt>ar«'»>tica  (©<!«,  Heim  von  Foerster), 

Kacj#  Jr»  pOTBcataon,  ww  im»5  a55l,  XOl-iiZio 

f»7_  MendelfcS'crtj  Band&i,,  "Cs^.'r.UsGtico  a la  Rwceia  KaiJw^t&aus  J>ie  JffA.*.?  d?  _- 

uaroeieitlOBtB  Pu’jClieationa  ds  I'icstitorb  do  SuitlstjuX1*?  c-»  I'Dtc^.veirel^ 
de  Paris,  2S  5.953,  1-12!;,. 

58*  3 «*n  Isfon#at5.or«l  Thecspy  of  tbs  S\<ata*fc&a«l  Sts  -icturo  or  juanguag»,s 

jygp^qeticn  Tbaagy  (ado  Willis  Jsrkswi),  1953*  &cs«Js!5&.i  Pr««e>,  Se»  Tcek, 


•j«  »t  C^S(3«ivftT3 ii”, !. 2 DiTV’i^  T3 ttasS^*"* 


10,  19^-j  1-27  o 


^*m**Mt&&*  ewMffw**'**** 


n 


wiwa r»«ara  ggesaataar  «..  wv 


n«v«;-  -VW,tr)F ' ■-««-(«»; » 


; 


« 158- 


6lo  NeaSH^  V«  i<<*  l^tvitmrifcte  Traniroi&gion , qg _Infcrng,t3.op  ard  it#  ReI«i<3C  to 
Aacbreis  .«r-‘%Tiai 103.  Pspcrfe  ^5T^r*fiS»»»H  TsSSSSa  Ojpcretlaaa  Assess* 
tab«kl«?rn  H»f  '¥7r^'553o 


*?„  a "KaltiisarSAis  Inforr.ati.fln  Tranatalssionj,"  PaycUomatrlka.i,  to  be  psAwdsmoj 
and  Rsse&rjs  Labore&egy  of  Electronic#  and  lancdln  ITsEopfitfity  TBohrdLcal 
Memarandtss  No0  U8,  McI.T.*  1953*  17  PP« 

ti3o  McMillan,  Bro Q}am$rf  "m a Baerlc  Yzte&t&m  of  Xnfcxaatieu  Theory,  " Sag  Anala 
of  HattaenaUsal  Stetlsti.a»s  1953*  196-219. 

6*i„  K«r**i5  Jo*  "Ole  ZaAtHehen.  7ccrtialtn&aae  dor  WllO-snaihatlg tealt, n Philoso  Sto« 

g.  ifiSe.  7Xoiz?„  “*  “ 

65o  Hlll0rs  **»  Ao  and  r*-ick£  Vo  C.»  -Statistical  Eekariaristise  and  Sequences  of 
Responses,"  t>*y^hoIoKlcal  florlowB,  56*  ’_9iP,  JLL-3?!*; 

66«  Hiller,  G,  Ae  and  Sa&f:*idso,  Jo  A„  B',Marbal  Context  said  th  Recall  sf  Meaning- 
ful Material,"  American  Journal  at*  Pevahol&Ey.  6>£  1950,  17 6-135 « 

6?,  Killeg,  0*  .tor  «Sp*otJ»  aiid  J^s^jasss,,*1  Satxfogsk  of  Knxnrliraatel Psyohtacgr 
(S»  S.  StofwoBj,  editor);}.  4 doc  UUsy  ’ 


68«  a X riagoaea  and  CkrT/jiatcatlon^  Ka0r*»"4iill,  Hew  lari'*  1951. 

69 o Hiller,  Q„  A,  «®d  (1*  Aa*  ami  L idnm,  V,,  *tfcs  Intel  ItgiMllty  of 

SJpaeefe  cs  -s  Fenvtian  of  the  Context  of  the  Test  Materials*  * Journal  «f 
Eaperiasaital  Pe.~vi:°iOi^y-  hi-  195L£  229-3350 

70.  Millet*,  0*  Ao,  A Hcrte  oo  ■Ui&  Sanyline  QjrUgitmri  one  f jag  Sbronota-aagsaar 
Heasoara  »iC  ijapubllfl’yi  “ 

71o  B "What  is  Iniwrratxss;  MeamIje■c^nsnt?,*  Agarjcan  Psycoblagisl..  8,  1953.  3=11. 

725  o n CocKmiilcatior^ * Annual  ttevier  of  ,/»  5*  (Stone,  C.  P0  sad 

1 Hdtowai’.,  Qo>  editflro]u  KremST Ttexie^V .Sc* , bt •inford*  195U*  a01«=ij20& 

73.  Killer,  Oo  A«  and  Ka dsv>  W.  Q0?  On  teg  Bhugsaa  iikBxihco&  &atja»tea  of  the 
ShawMX»*«ienBr  Mecaurs  of  Infoas&^So  (£»  prepu'i  felon),  U*$-.  * 

?U*>  Heimr,  B,  3«,  "CcecutetilotrAl  Methods  Vmefnl  Is  Analygiag  Series  of  Brassy 
uata,”  Aaerltian  triT  PaychdLog^,  61**  1951*  252*4262. 


75.  f'*=W»ans  E„  Bb  ana  CiOrpteniip  0*,  JoS  "A  Saw  Method  for  .Asalyid 

J«  Exp.  Psyohoicara  IvU,  1952,  H!r»12.5<> 

76,  r^lla<jk6  iTvln,  ^IcftTaatlcsj  of  EleJkante’ffl'  Auds  tery  Risplas’.' 

t^  Acecgtical  uf  Atarlctfo  a?*  195-2S  705-750. 


•Tg  SrgligH; 

Journal  aT 


7?  * j SoT-  iijwjlmllatian.  of  .^^^qtially--fiw£rwd  Ipfornatloau,  1*  Kethodcdcg 

=^r^*YS(,g^g?iF^ve  ?«  ^^ctTixF- it'nte  of  ciKtKjtl^ 

JufAau  tt£  SSpauaafl.  ftaaa  &S*csrrcf**j  Tss**srs-ci 

Tu^rt^^nS^iton,-  1952,. 


Hmeiwasr- 


.•■KSfasssi®*^ 


o 


•159- 


m 

& 

£ 

s 


Vo»  rritts  iTLefceiis?,  secret  aa£  Qrgente  BI1"®  BSKtain  Books*,,  Osrdeu  2itgr,  131*7., 

fj.  ypoegqdjjffli  of  8te  Laadoa  5m>csiaa  ca  Int's-gitlc^  T t*>;gy»  5350n  See 
” TiSSwsi  iljSj  " *"  " “ 

SQ0  PyceeoMnaa  of  gs  London  3ypcel««  an  htiiamtio a Kggg^ji  1952 - See 
'Jl&3z&aSi  "0*3  J ' 

8l0  Qas^tlar-  Henry  {editor),,  Breye  an  toe  Wee  of  inforrotiog  Themy  In  Biology, 
«sJ.rers?.isir  of  ILliil?33^"l:Pn3S9  ti  rb*»  Si  *9  'l$5‘ja 

62 <,  Qaas&sr*  Henry  sod  l»aLffs  *•  J>,  Cssaz; l^Zopmjszs  In  IgfggKviblcp.  IreMBtaaicaTi 
Psrt  Pass  SOsaSbe  Sggngn&al  !!teij^8^a&~qegs.»  l3Sua 


Rn  

wy  o WgJeWJ 


E„  "DefixxL'WLon  of  Intfcawation,,*'  Frsoaedlngs  of  the  XnatAtCMH  of  Radio 
Engl^rra*  39»  i35H9  290*  " " ' 

Sl»u  Roge?®*  Mo  Sea  Aft  A-pplicatlai  of  XidforafttlaEi  rr>Gary_ to  the  Prtfclea  cf  _ti» 
Ralfttfa*g;^g"T55^^  oTMetorli^iMM  Pec&jitvm* h t*L* 

T^nlnrc  W&uxtimn  Ph«Do  Thaalo®  ^rlcost™.  Oatvarsity,,  1352b  'Hfiseo- 

grssphad. 

S5«  Rogers*  M~  S„  *ac  Gteesn,  Be  Fo9  Tho  Mcrar/r1—  sf  Baads  Infcasatlga _aihfttt  *ia» 
Alters uxtj y/jta  ara  Ecaal3.v  LiSaf^-  dltt>a^aapai*a  WehoIoSoal  Labor*- 
tvi  ' 'd  (MviwaAty*  Ik&o 

_p  tables  cf  the  rfeac.  tad  Yarlenas  of  Sample  Infca^satdoa  Mfaan  tin 
Altarnotj-rgg  are~5c,aally  Lifea^,  ki^eotg’RpKsd*  j’syB^oioglcal  Lafee- 

icrisit  Hazard  6^£vi»vadiy9  15Ebsldgs» 

8?,  Shannon,,  G0  £o«  "A  M£.‘chSB4vtisaL  Swcsy  s£  CcnKaoicaijaEi . " Sail  Systaa 
Teehnlgal  J<*gual,  27s  1&8»  379-423  sod  623=6*4. 

Do,  Shannon*  C,  E«  and  Waarcr^  y'srrrsn,  gas  KatSsaslical  'Shsuey  at  CaasgnioatioPa 
i Hniv^rslty  cf  mincis  Prea»s 

89,  Sb&r-iMo,  Co  B&»aa?1z&coU€>n  Thew/  of  8MMcy  system,"  Ball  Sfrsttta 


fMdjitcal  Jaamlj  28s  1S&9,  gtt-Tl&i 

3O0  f»  *Snr  Recundsiiy  of  EnGlighj,’1  Cybr/r&etlra  (ed0  Halna  von  ?essstsr) 

Joeiah  Kaoyj,  Jts  Foundaticnj  Hei/TorSp  l?5bs  12>^8o 


Cl. 


„ 'I’l-adlotion  ood  Eiiu^iiy  of  ifrlatsi  Kngli«-i.  ~ a 
■;  vQ£tMiln  30#  13?1*  50^*Uo 


C 


v2. 


o n&'cstificsr.woH  Xiie'^iyfl  if  TzzJszzsAzLSj,*  — S 

7*hs  likcWleiva  of  Ea^Jsags-g.^  ‘Jtroup  00  iiufewitlon  ’ 

lihsoiiyj  1«  Wt-5??o 


•y^,.  0 “flcnai^l  Troatwcat  of  the  Prtfclwa  of  Cading,,"  T^xyotioiw  of  the 

of  Rad5.o  En&xgmmii»  Froi'ssslsral  CJrt'op  on  Infasnsafexon 


*.a»  L&Ulea 


Tya^r" 

, a anal  CiroRp 


9>,  Svs^ar**  f.  I,.,  *i  BS&Lia 
Qtnwi'wtiatij B TrmijMur? 
local  Ortap  ca  Tnfcin. 


i -t  Ii;fo.j»«.feS.<»  -- 
, y.<xT  the  Ingfel-feafe:„y 
5 ^/T  Theory, 


1«.T-  yheosgy»  Ret*- 


57 o 7IorTKSii»5  E.  L„-  end  Larg*  ';  , Tfca  Teacher^  tf?  **, 
Brara*u  of  Puraiesatlw-  •;  jj^Sirs"  ColLIc^T  t>  ' 


96.  iranwaiioai  of  taw  Ira  hit-  - .-I  Haalo  Eagjftfrg.gu 
Inrwmetiao  7ba*aryt  ~X  ' 'J^iiasui  • 

1953?  3s  1?^- 

9$a  Wttatf,  NwHsa-fe,  Qytfci-aFfel  .;cisn  Wiley  arwi  Sane*  - 

190..  a Egta’agclafeiftri  ~nt.a  ^.tiai.  and  aaootUn 

~ John  HUay  and  Saa*7  L'  ■ " 1 ^ 

101  o Vilks.  S,  S.,  *7**  7ei;£  of  Inc%*j3d»iu, 

Annals  of  Katfasw.aticr  ./.tistiea.  >;t  1935*  ■> 

102.  Zlpf,  Q.  Xo,-  gcssn  r-vlc  .nd  the  Primlpln  oZ  \ 

Pmmi.  . ft-  n*.  ,1^?  Jk  ~ V<5J: 


BEST  COPY  AVAILABLE 


UNCLASSIFIED 


